Skip to content

The Civil War in Development Economics

what_works_in_developmentFew people outside academia realize how badly Randomized Evaluation has polarized academic development economists for and against. My little debate with Sachs seems like gentle whispers by comparison.

Want to understand what’s got some so upset and others true believers? A conference volume has just come out from Brookings. At first glance, this is your typical sleepy conference volume, currently ranked on Amazon at #201,635.

But attendees at that conference realized that it was a major showdown between the two sides, and now the volume lays out in plain view the case for the prosecution and the case for the defense of Randomized Evaluation.

OK, self-promotion confession, I am one of the editors of the volume, and was one of the organizers of the conference (both with Jessica Cohen). But the stars of the volume are the speakers and commentators: Nava Ashraf (Harvard Business School), Abhijit Banerjee (MIT), Nancy Birdsall (Center for Global Development), Anne Case (Princeton University), Alaka Halla (Innovations for Poverty Action), Ricardo Hausman (Harvard University), Simon Johnson (MIT), Peter Klenow (Stanford University), Michael Kremer (Harvard), Ross Levine (Brown University), Sendhil Mullainathan (Harvard), Ben Olken (MIT), Lant Pritchett (Harvard), Martin Ravallion (World Bank), Dani Rodrik (Harvard), Paul Romer (Stanford University), and David Weil (Brown). Angus Deaton also gave a major luncheon talk at the conference, which was already committed for publication elsewhere. A previous blog discussed his paper.

Here’s an imagined dialogue between the two sides on Randomized Evaluation (RE) based on this book:

FOR: Amazing RE power lets us identify causal effect of project treatment on the treated.

AGAINST: Congrats on finding the effect on a few hundred people under particular circumstances, too bad it doesn’t apply anywhere else.

FOR: No problem, we can replicate RE to make sure effect applies elsewhere.

AGAINST: Like that’s going to happen. Since when is there any academic incentive to replicate already published results? And how do you ever know when you have enough replications of the right kind? You can’t EVER make a generic “X works” statement for any development intervention X. Why don’t you try some theory about why things work?

FOR: We are now moving in the direction of using RE to test theory about why people behave the way they do.

AGAINST: I think we might be converging on that one. But your advertising has not yet got the message, like the JPAL ad on “best buys on the Millennium Development Goals.”

FOR: Well, at least it’s better than your crappy macro regressions that never resolve what causes what, and where even the correlations are suspect because of data mining.

AGAINST: OK, you drew some blood with that one. But you are not so holy on data mining either, because you can pick and choose after the research is finished whatever sub-samples give you results, and there is also publication bias that shows positive results but not zero results.

FOR: OK we admit we shouldn’t do that, and we should enter all REs into a registry including those with no results.

AGAINST: Good luck with that. By the way, even if do you show something “works,” is that enough to get it adopted by politicians and implemented by bureaucrats?

FOR: But voters will want to support politicians who do things that work based on rigorous evidence.

AGAINST: Now you seem naïve about voters as well as politicians. Please be clear: do RE-guided economists know something the local people do not know, or do they have different values on what is good for them? What about tacit knowledge that cannot be tested by RE? Why has RE hardly ever been used for policymaking in developed countries?

FOR: You can take as many potshots as you want, at the end we are producing solid evidence that convinces many people involved in aid.

AGAINST: Well, at least we agree on the on the much larger question of what is not respectable evidence, namely, most of what is currently relied on in development policy discussions. Compared to the evidence-free majority, what unites us is larger than what divides us.

This entry was posted in Academic research, Books and book reviews, Metrics and evaluation and tagged , . Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.


  1. teekay wrote:

    Thanks for this post. Just on a technical note: The book is ridiculously expensive outside US (almost the equivalent of $32 on…are there any plans to distribute it more widely with reasonable prices?

    Posted December 3, 2009 at 3:34 am | Permalink
  2. Steve wrote:

    Great post.

    I love it when you talk about theory and specific aid projects but can’t bare to read post where you wave the free markets wand.

    Posted December 3, 2009 at 4:22 am | Permalink
  3. April wrote:

    Thanks Bill. The Brookings event was fab, and I’m looking forward to reading the book. I’m particularly grateful you called attention to that egregious “best buys” list posted at JPAL – which represents one of the worst behaviors of the randomista camp: pretending to have answered pressing program design or resource prioritization questions, when they’ve done nothing of the sort. Paraphrasing Angus Deaton…yeah, so you’ve shown demand curves (for bednets say)… what (now what do we do to get as many people sleeping under bednets as possible). No answers from RCTs….alas.

    Posted December 3, 2009 at 7:41 am | Permalink
  4. $32 is pretty good for an academic volume, especially from Brookings.

    Thanks for the summary, Bill. I’m planning a new course on development and this will be handy.

    Posted December 3, 2009 at 9:01 am | Permalink
  5. djbtak wrote:

    Congrats on the book and a fair outline of what’s at stake. There are two main issues with RE that reflect colonial-missionary mindsets which prevent those I know with accountability to specific communities (rather than “development in general”) from seeing hope in the mainstream development sphere.

    The most important, as you gesture to in the roleplay, is that the RE mindset treats local knowledge as fundamentally unable to contribute to evaluation. This is a values question. The strategy of replication says implicitly: if somewhere else does this better (according to our criteria which we have set without talking to you who we think we want to aid) we should invest our aid money in those kinds of places; or alternatively, if it works better somewhere else we’ll implement that approach with you despite what you might have experienced in the programme. It’s a fundamentally condescending and instrumental approach to people. Maybe, as you suggest, there can be a political calculus which says “we need to do it this way to keep donors happy.” I can accept that argument. But what I tend to see is people who are happier knowing how they can measure effectiveness than rethinking what effectiveness might mean in response to a specific setting. So I am suspicious, because I can see all the benefits that accrue to development experts using this methodology, but demand does not seem to be coming from the aid recipients for this.

    The second issue is one around methodology. Borrowing medical RCT methods for their evidential value neglects the fact that these methods are in decomposition in the public health sector as more holistic approaches that can account for cultural and social factors gain methodological strength. This is because RCTs work well for relatively well-specified problems (e.g antibiotics), but many human problems have more complex dependencies (e.g. mental health) and the RCT track record in these areas is patchy at best. This does not mean that complex local systems cannot be evaluated, just that ethnographic in-situ methods will yield more reliable and actionable information if you want to make changes where the research takes place, as the human factors will be the biggest determinants of success as you have noted previously.

    RCT is a perfect method for working with the effectiveness of new well designs in extracting water at given salinity levels etc. But development problems like the MDGs are more complicated than that, and anyone who really wants to fix them probably needs to be prepared to put their own motivations on the table rather than hiding behind quasi-scientific neutrality.

    Posted December 3, 2009 at 9:44 am | Permalink
  6. JL wrote:

    Or when our beloved randomistas claim that they’re having an impact:

    Where s the control group?

    Posted December 3, 2009 at 12:20 pm | Permalink
  7. David wrote:

    Bill, in addition to your sound “against” arguments, I would mention one more. Aid bureaucrats are always desperate to get a positive evaluation. The easiest way to do this is to use the “ultimate low hurdle” of an “impact evaluation” of a project spending umpteen dollars by comparing it to the counterfactual of “doing nothing” (which spends zero dollars). Then there is a great flourish of using RE and other sophisticated statistical tests to “evaluate” the project in comparison with the phony counterfactual of doing nothing. Since most any project costing umpteen dollars will have some effects that are “better than nothing” (if the measurement is done quickly), then the project manager gets a “positive evaluation” using “rigorous scientific methods.” Of course, a true counterfactual would be an alternative project spending comparable resources–but then the rigorous scientific evaluators plead “no data.” The nice debate over RE should not obscure even more elementary frauds, e.g., impact evaluations using phony counterfactuals of doing nothing, in the current practice of development evaluation.

    Posted December 3, 2009 at 12:37 pm | Permalink
  8. Raphael wrote:

    Is it really a “civil war?” Perhaps that’s a bit of an exageration. My sense is that people in the evaluation field are converging towards a “mixed methods” approach that values and uses both quantitative and qualitative methods. For example, if you combine a RE with qualitative methods like well planned focus groups, in depth anthropological studies, key informants, etc, you are going to have a much better product in the end. The “gold standard” is no longer RE, but rather whatever method or mix of methods best answers your evaluation question.

    Posted December 3, 2009 at 1:50 pm | Permalink
  9. William Easterly wrote:

    Raphael, anecdotally, at least in private, I hear a lot of passion that verges on “civil war” from both sides. Of course, the claim I just made is not verifiable by randomized evaluation. Bill

    Posted December 3, 2009 at 4:31 pm | Permalink
  10. Asif Dowla wrote:
    Posted December 3, 2009 at 10:19 pm | Permalink
  11. Laura Freschi wrote:

    @Asif Dowla
    Thanks for the correction.

    Posted December 3, 2009 at 11:25 pm | Permalink
  12. avam wrote:

    The impact they claim is pretty huge: “The First 28 million Lives. Three programs have already been massively expanded as a direct result of J-PAL evidence, impacting millions of lives for the better.”

    Given the opening statement of ‘direct result’, I’m surprised they even put in the word ‘could’.

    “1. 7 million children have already benefited from school-based mass deworming campaigns, and tens of millions of children could be reached over the next few years.”

    Interesting post. Am looking forward to reading the book.

    Posted December 4, 2009 at 6:19 am | Permalink
  13. clay wescott wrote:

    Adam Fforde makes a good contribution to this debate:

    Posted December 4, 2009 at 7:56 am | Permalink
  14. anon wrote:

    The debate in the academic world sounds fascinating! And it mirrors in some ways the ongoing debates I have within the international development practitioner community, where I work. Due to my background and current job, I’m the resident RCT “expert” of sorts in my organization and get to have lots of fascinating discussions with program and M&E staff. I see the following pros and cons for randomized evaluation (or RCT’s – randomized control trials – as they are often called in the NGO world):

    – As always, the key idea that you can’t attribute causality of impact without a randomly-assigned control group. Selection bias and other problems affect any other method to varying degrees.

    CONS (or rather, arguments for having additional approaches in your evaluator’s toolbox):
    – RCT’s are harder to do for long-run impacts. You either have to leave the control group without the program for 10-20 years, which is an ethical and logistical challenge. Or you have to rely on some assumptions to add effects together from repeated follow-up surveys. For example if you delayed the start of a program in the “control group” for three years and then did a follow-up survey every three years, then you could add the difference between 3 and 0 years plus the difference between 6 and 3 years plus the difference between 9 and 6 years, etc, but you’d have to assume some stuff like linearity in the effect over time or specific types of interactions with global on-off events? (I’m still thinking about this whole idea.)
    – With a complex or system-wide program, you often can’t have a control group, such as if you are working on a national scale. For example, working to change gender injustices in a country’s laws.
    – Context is important and you can’t always get that with good background research or a good pilot before an RCT, though you should try. My organization talks a lot about “mixed methods” – mixed quantitative and qualitative research being a good way to combine the strengths of each. In fact the RCT that I’m overseeing includes a team of anthropologists.
    – Qualitative research can also be more responsive if you get unanticipated results that are hard to explain.

    So, being a good two-handed economist, I do see both sides now, though I’m still pro-RCT. It helps that I was at that bastion of qualitative methodology, the American Evaluation Association conference (another AEA!) and heard some good indoctrination on the anti-RCT side.

    It’s particularly interesting to be at my INGO since much of the organization’s work is focused on areas that are tough to evaluate with RCT’s including lobbying the U.S. govt; humanitarian relief work (though we have a few staff who want baselines for refugee camps); and many small-scale, long-term, idiosyncratic projects in communities facing severe challenges.

    The closest I’ve come to agreement with people who are anti-RCT is to have all of us agree that it’s a great tool in the right circumstances but that it’s one of many good tools. What we always disagree on is whether RCT’s are overused (them) or underused (me). And many people hate the words “gold standard”. It’s a red flag. I use it anyway, as in “RCT’s are the gold standard for short-run impact evaluations that you want to be free from selection bias.”

    I think that the “right circumstances” for RCT’s would include important development approaches such as clean water or microcredit that haven’t been evaluated yet with RCT’s; or big programs that are finally stable in their implementation after an initial period of experimentation and adaptation. Pilots are OK, too, though that is a harder sell; program staff want to be able to get in there and experiment away with what works and what doesn’t without worrying about rigorous evaluation.

    It’ll be interesting to see where these discussions are in 5 or 10 years.

    Posted December 4, 2009 at 7:47 pm | Permalink
  15. strainer wrote:

    In my view what the US government (and other govt’s that have followed in their footsteps) has continued to do to try to “improve” the economy is very misguided. They have wasted trillions of dollars bailing out creditors and shareholders of failed institutions with broken business models rather than addressing the structural flaws in the system of too much debt. And this is going to lead to massive problems down the road with regard to our currency and interest rates, in my opinion. And I think that the gold price breaking out to a new high is a strong indication of the reduction in faith and confidence that people have in governments and their fiat currencies. I recently read several good articles at http://www, that discuss the Federal Reserve’s easy monetary policies in order to try to prevent any sort of deflation from occurring and to try to reflate assets prices. One I found particularly interesting is called “Gold Price Cheaper Now than at $300 – Hathaway”. I think these articles are very helpful for investors to read because they help to explain the investment implications for the dollar, the gold price, and gold mining companies who I believe will continue to benefit from central banks’ inflationary policies.

    Posted December 10, 2009 at 11:50 am | Permalink

6 Trackbacks

  1. By uberVU - social comments on December 3, 2009 at 2:18 am

    Social comments and analytics for this post…

    This post was mentioned on Twitter by guanfeed: The Civil War in Development Economics: Few people outside academia realize how badly Randomized Evaluation has pol…

  2. By J-PAL’s Christmas shopping list « Aid Thoughts on December 3, 2009 at 10:16 am

    […] It looks like these are pretty common criticisms. […]

  3. […] cyklu: polecam się na święta Znaczy do […]

  4. […] link here  Share and Enjoy: […]

  5. […] William Easterly writes, “Few people outside academia realize how badly Randomized Evaluation has polarized academic development economists for and against.” […]

  6. By Going Pro « Tales From the Hood on December 4, 2009 at 3:35 pm

    […] – I found more annoying than normal. @Bill_Easterly, way to push my buttons, bro. (e.g.,  here, here, and […]

  • About Aid Watch

    The Aid Watch blog is a project of New York University's Development Research Institute (DRI). This blog is principally written by William Easterly, author of "The Elusive Quest for Growth: Economists' Adventures and Misadventures in the Tropics" and "The White Man's Burden: Why the West's Efforts to Aid the Rest Have Done So Much Ill and So Little Good," and Professor of Economics at NYU. It is co-written by Laura Freschi and by occasional guest bloggers. Our work is based on the idea that more aid will reach the poor the more people are watching aid.

    "Conscience is the inner voice that warns us somebody may be looking." - H.L. Mencken

  • Archives