Skip to content

Michael Clemens won’t let up on the Millennium Villages + bonus links

It’s nice to see scholars bringing attention to the critical need for evaluation and informed public dialogue (not just “success stories” or short-term impact evaluation) for the Millennium Villages Project, which we have also covered on this blog. Michael Clemens of the Center for Global Development is currently carrying on a very revealing dialogue with Millennium Villages.

In Michael’s first blog post which we blogged, he makes three central points:

  1. The hundreds of thousands of people living in the Millennium Villages, present and future, deserve to know whether the project’s combination of interventions is backed up by good science.
  2. Randomized evaluation is the best way to do this. While it may be too late to properly evaluate the first wave of villages, there is still time to conduct such a study for the next wave of villages.
  3. The MVP evaluation should demonstrate long-term impact before it is scaled up.

In a subsequent post, Michael parses the curious non-answer he receives from the director of monitoring and evaluation for the MVP, Dr. Paul Pronyk. He breaks down—for those of us not intimately involved in the finer details of impact evaluation—the difference between true scientific evaluation and what the MVP says it is doing, namely “matched randomly selected comparison villages.”

What the MVP has done is something very different from…a rigorous evaluation.  First, village cluster A1 was chosen for treatment, for a range of reasons that may include its potential for responding positively to the project.  Then, long after treatment began, three other clusters that appear similar to A1 were identified — call these “candidate” comparison clusters A2, A3, and A4.  The fact that all three candidates were chosen after treatment in A1 began creates an enormous incentive to pick those candidates, consciously or unconsciously, whose performance will make the intervention in A1 look good.  Then the comparison village was chosen at random from among A2, A3, and A4.

Differences between the treated cluster and the comparison cluster might be due to the MVP. But those differences might also be due to how the original Millennium Village was chosen, and how the three candidate comparison villages were chosen.  This is not a hypothetical concern…

So, either the MVP director of evaluation does not understand evaluation…or he thinks we won’t know the difference.

Dr. Pronyk promises the release of the MVP’s midpoint evaluation at some unspecified time later this year, and said they “look forward to an active discussion about the initial findings regarding poverty, hunger, and disease in the Millennium Villages.” We hope the scholarly community and the wider reading public concerned with development issues will give Dr. Pronyk precisely what he’s asking for.

Bonus Links

* Sounds a bit like a parody we wish we’d written….but it’s true. Yesterday’s NYT features this quote from a story on China’s bid to supply California with technology, equipment and engineers to build a high-speed railway, and to help finance its construction:

“We are the most advanced in many fields, and we are willing to share with the United States,” Zheng Jian, the chief planner and director of high-speed rail at China’s railway ministry, said.

* We’d be remiss not to mention this helpful timeline of celebrity aid to Africa featuring an interactive map from Mother Jones (and some additional commentary from Wronging Rights and Texas in Africa.)

This entry was posted in Aid policies and approaches, Badvocacy and celebs, Maps, Metrics and evaluation and tagged , , , . Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.


  1. Dan Kyba wrote:

    I need some education here, since I have not paid very much attention to the MVP project and the evaluation debate. I looked at the map and was surprised by how the target villages are distributed.

    Why were the villages not paired along the bases of matching cultural, environmental and similar economic options but which are separated by a national or other form of artificial boundary?

    Call this a border comparison; by ticking off the similar variables we are then left with the unmatched variables which might have a causative effect. An example of this type of approach was the recent blog on this site re the study by Daniel Berger: “Taxes, Institutions and Local Governance”

    Posted April 9, 2010 at 12:42 pm | Permalink
  2. Kim wrote:

    The main defense in terms of not doing a true randomized design by the Sach’s camp is that to intrude on a location and take up locals’ time and energy in collecting data while in essence withholding the treatment of aid is unethical. The argument on this post, in contrast, seems to be that it’s also unethical to bring an intervention to scale without knowing its true effects and that the best way to do this is through randomized controlled design. Fair enough points on both sides.

    Seems like there’s some middle ground here, and that middle ground might just be what this “matched comparisons” approach is trying to do (assuming that the matches made are on reasonable criteria). The question really is whether the threat of slightly more bias in this “matching” process is enough to outweigh the harm done to control villages in a true randomized trial.

    Posted April 9, 2010 at 4:15 pm | Permalink
  3. Adam wrote:

    Why would a piece on Chinese offering investment and technical assistance to Californian railways be a parody? They’ve got the fastest railways in the world and the US, well, hasn’t.

    Posted April 10, 2010 at 12:35 pm | Permalink
  4. Michael Woolcock wrote:

    This tone of this thread surely isn’t quite right… One can do rigorous evaluations using matching techniques; Don Rubin and others pioneered the whole approach (i.e., of propensity score matching techniques). Providing the key variables are readily ‘observable’ and comprehensive data is available, PSM has been used by many serious evaluators. It’s a reasonable question to ask whether this is what MVP is deploying, but Paul Pronyk is a serious guy and should be given the benefit of the doubt.

    Moreover, even if a truly sqeaky clean identification strategy was used as part of the evaluation (i.e., a fully randomized study) and a positive verdit was reported, there would still be HUGE external validity questions (just because it worked ‘here’, how do we know it will work ‘there’? Can we assume bigger will be better?) And unless we know the functional form of the impact trajectory for the MVPs — which for a multi-stranded intervention we surely don’t — no randomized study will deliver a for-sure verdict on its success or failure. But the big picture challenge for me is not whether super-targeted, well-funded, high- commitment, best-science interventions will reduce poverty. The surprise would be if they didn’t. No, the core development challenge remains: how do we help solve context-specific problems, at scale? Fighting over who has the “best” (any?) evaluation protocol for the MVPs doesn’t move us much closer to answering that one…

    Posted April 12, 2010 at 11:38 am | Permalink
  • About Aid Watch

    The Aid Watch blog is a project of New York University's Development Research Institute (DRI). This blog is principally written by William Easterly, author of "The Elusive Quest for Growth: Economists' Adventures and Misadventures in the Tropics" and "The White Man's Burden: Why the West's Efforts to Aid the Rest Have Done So Much Ill and So Little Good," and Professor of Economics at NYU. It is co-written by Laura Freschi and by occasional guest bloggers. Our work is based on the idea that more aid will reach the poor the more people are watching aid.

    "Conscience is the inner voice that warns us somebody may be looking." - H.L. Mencken

  • Archives