Aid Watch received the following very thoughtful comment. The author wishes to remain anonymous:
The debate in the academic world sounds fascinating! And it mirrors in some ways the ongoing debates I have within the international development practitioner community, where I work. Due to my background and current job, I’m the resident RCT “expert” of sorts in my organization and get to have lots of fascinating discussions with program and M&E staff. I see the following pros and cons for randomized evaluation (or RCT’s – randomized control trials – as they are often called in the NGO world):
PROS:
- As always, the key idea that you can’t attribute causality of impact without a randomly-assigned control group. Selection bias and other problems affect any other method to varying degrees.
CONS (or rather, arguments for having additional approaches in your evaluator’s toolbox):
- RCT’s are harder to do for long-run impacts. You either have to leave the control group without the program for 10-20 years, which is an ethical and logistical challenge. Or you have to rely on some assumptions to add effects together from repeated follow-up surveys. For example if you delayed the start of a program in the “control group” for three years and then did a follow-up survey every three years, then you could add the difference between 3 and 0 years plus the difference between 6 and 3 years plus the difference between 9 and 6 years, etc, but you’d have to assume some stuff like linearity in the effect over time or specific types of interactions with global on-off events? (I’m still thinking about this whole idea.)
- With a complex or system-wide program, you often can’t have a control group, such as if you are working on a national scale. For example, working to change gender injustices in a country’s laws.
- Context is important and you can’t always get that with good background research or a good pilot before an RCT, though you should try. My organization talks a lot about “mixed methods” – mixed quantitative and qualitative research being a good way to combine the strengths of each. In fact the RCT that I’m overseeing includes a team of anthropologists.
- Qualitative research can also be more responsive if you get unanticipated results that are hard to explain.
So, being a good two-handed economist, I do see both sides now, though I’m still pro-RCT. It helps that I was at that bastion of qualitative methodology, the American Evaluation Association conference (another AEA!) and heard some good indoctrination on the anti-RCT side.
It’s particularly interesting to be at my INGO since much of the organization’s work is focused on areas that are tough to evaluate with RCT’s including lobbying the U.S. govt; humanitarian relief work (though we have a few staff who want baselines for refugee camps); and many small-scale, long-term, idiosyncratic projects in communities facing severe challenges.
The closest I’ve come to agreement with people who are anti-RCT is to have all of us agree that it’s a great tool in the right circumstances but that it’s one of many good tools. What we always disagree on is whether RCT’s are overused (them) or underused (me). And many people hate the words “gold standard”. It’s a red flag. I use it anyway, as in “RCT’s are the gold standard for short-run impact evaluations that you want to be free from selection bias.”
I think that the “right circumstances” for RCT’s would include important development approaches such as clean water or microcredit that haven’t been evaluated yet with RCT’s; or big programs that are finally stable in their implementation after an initial period of experimentation and adaptation. Pilots are OK, too, though that is a harder sell; program staff want to be able to get in there and experiment away with what works and what doesn’t without worrying about rigorous evaluation.
It’ll be interesting to see where these discussions are in 5 or 10 years.



7 Comments
For me, the biggest challenge is that evaluating programs to see if they work or not does not fully capture the implementation processes in the particular NGO or government department. It should be obvious that it is meaningless to make projections about a program’s success or failure without analysing the capacities and commitment of the agency (and its people) responsible for implementing it.
Perhaps every randomized trial should be accompanied by a study of the organisational practices and the implementation chain through which the program is delivered. Also, this would involve very little additional costs, since high quality RCTs usually have dedicated field researchers, who are potentially store-houses of critical organisational information.
As a graduate student at Georgetown studying international development, this debate often comes up in class.
This is one of the most refreshing, succinct explanations of evaluation methods I’ve seen. Truth often lies in the middle two extremes; so while I’m certainly a fan of RCT’s, it’s foolish to consider them the only helpful evaluation tool.
I run the blog over at “Economists for Firing Larry Summers” and I reviewed a bit of Bill Easterly’s past research here, which is no doubt of interest to Bill Easterly fans. It’s here .
The post includes a short review of the Easterly et. al. paper “Was the Wealth of Nations
Determined in 1000 B.C.?*” It was definitely an interesting-looking paper which everyone interested in development should read.
Good luck on getting RCT’s approved in politicized environments! If you can’t get effectiveness of the flu vaccine scientifically validated, I don’t know how you have any hope of doing it for programs in more complex, politically charged environments.
There’s a lot of focus on selection-bias here, but I’m not sure selection bias is really the biggest issue. It’s identification. We’re not that bad at correcting for selection bias, and have been able to do it using regressions since the 70s and in the past 10 years have developed ways to do it using nonparametric techniques as well. The best field experiments are the ones that are able to identify specific effects that were either previously bundled into a single variable, or heavily proxied.
e.g. Olken shows corruption follows standard rules of IO. How do you identify the mechansims of corruption without an experiment? You don’t.
e.g. Kling, Liebman and Katz show neighbourhood effects cause reduction in obesity and anxiety. This one does feature selction bias prominently. But, its not really overcoming the selection bias that’s important. If we could predict neighborhood decisions, in a correctly specified model, we could correct for selection bias in an equation explaining anxiety or obesity using old-fashioned Heckman lamda stuff, or new-fangled nonparametric methods. But these aren’t things that are easily identified using traditional data. That’s the reason it gets in econometrica, not just because it overcomes selection bias.
There’s a lot of focus on selection-bias here, but I’m not sure selection bias is really the biggest issue. It’s identification. We’re not that bad at correcting for selection bias, and have been able to do it using regressions since the 70s and in the past 10 years have developed ways to do it using nonparametric techniques as well.
It’s true that there is at least one established method for correcting for selection bias but the problem remains that you need to find the right variable that would allow you to make the necessary correction. For example, It’s a common procedure to correct for selection when estimating returns to education since wage earners are unlikely to be representative. But it is not easy finding a variable that affects labor market participation and not wage rate.
Yes, Landover, exactly my point.
I think you would agree that theroetically, there are variables that explain labour participation but not wages. Empirical identification of those variable is the tricky part. The problem is therefore one of identification, not selection bias. If we have a situation of perfect identification, selection bias is not an issue, and there’s no need for experiments. Obviously this is wildly unrealistic, so there is a need for experiments.
I wouln’t say experiments are necessarily the ‘gold standard’, although I concede they are certainly most fashionable right now. Maybe the gold standard for a very particular type of problem, but not generally. A perfectly identified selection biased corrected regresseion model can be equally or more convinving. Experiments are easier but obviously have the scalability issue that prof. Easterly is so worried about; respectfully though, I think too much so.
Most of these papers, the good ones anyway, don’t pretend to be something they’re not, and are still contributing to our understanding of how important mechanisms work. They are generally read with full awareness of the limitations of experiments. We don’t have perfect methods; all we can hope for is an improved understanding (as opposed to a perfect one), which is something god experiments are doind a pretty good job of providing so far.
One Trackback
Social comments and analytics for this post…
This post was mentioned on Twitter by bill_easterly: Aid Watch: A voice of sanity addresses “The Civil War in Development Economics” http://bit.ly/5Jx3bj...