Skip to content

Is Impact Measurement a Dead End?

This post was written by Alanna Shaikh. Alanna is a global health professional who blogs at UN Dispatch and Blood and Milk.

We’ve spent the last few years watching the best donors and NGOs get more and more committed to the idea of measurable impacts. At first, the trend seemed unimpeachable. International donors have spent far too much money with far too few results. Focusing more on impact seemed like the way out of that trap.

But is it? The last couple of weeks have seen a spate of arguments from development thinkers rethinking this premise.

Steve Lawry at the Hauser Center, argues two main points against excessive focus on impact evaluation. The first is that it stifles innovation by keeping NGOs from trying risky new things. But I think that the problem is an institutional culture that doesn’t allow for failure. By allowing NGOs to fail and learn from failure, innovation is encouraged.

His second point is more interesting: “Many real-world problems are not easily described with the kind of precision that professional mathematicians insist upon. This is due to the limitations of data, the costs of collecting and analyzing data, and the inherent difficulties of giving mathematical expression to the complexity of human behavior.” This strikes me as very true. At what point are we expecting too much from our impact assessments?

In the same vein, the fascinating Wanderlust blog just ran a post about Cynefin. Cynefin is a framework for understanding systems. It categorizes systems into four subsets: Simple, Complicated, Complex or Chaotic. Chaotic systems, the author argues, can’t be evaluated for impact using standard measures. He states that “In a Chaotic paradigm, there is relatively little difference likely to occur in quality between a response that is based on three weeks’ worth of detailed analysis and one that is based on the gut reaction of a team leader…”

The Center for Global Development just published a paper by former USAID administrator Andrew Natsios. Natsios points out that USAID has begun to favor health programs over democracy strengthening or governance programs because health programs can be more easily measured for impact. Rule of law efforts, on the other hand, are vital to development but hard to measure and therefore get less funding.

Now we come to the hard questions:

If we limit all of our development projects to those that have easy metrics for success, we lose a lot of programs, many of which support important things like rule of law. Of course, if they don’t have useful metrics, how do we know those programs are supporting the important goals?

And how meaningful is impact evaluation anyway when you consider the short time frames we’re working with? Most development programs take ten years or more to show real impact. How are we supposed to bring that in line with government funding cycles?

On the other hand, we don’t have a lot of alternatives to impact evaluation. Impact is not unimportant just because it’s hard to quantify at times. We can’t wish that away. Plenty of beautifully designed and carefully implemented projects turned out not to have any effect at all. For example, consider what we’ve learned from microfinance impact evaluations. Microloans have a positive effect but not the one we expected.

It’s a standard trope of this blog to point out that there’s no panacea in global development. That’s true of impact evaluation, too. It’s a tool for identifying worthwhile development efforts, but it is not the only tool.  We can’t go back to assuming that good intentions lead to good results, but there must be room for judgment and experience in with the quantifiable data.

UPDATE: This post was edited to correct an attribution error in the third paragraph – Eds.

This entry was posted in Metrics and evaluation, Organizational behavior and tagged , . Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.


  1. Michael Kevane wrote:

    I think I know the answer, but I felt I had to ask anyway… You write, “Most development programs take ten years or more to show real impact. ” How could you possibly know this without having done impact evaluations?

    Posted July 20, 2010 at 2:05 am | Permalink
  2. Sam Gardner wrote:

    The thinking leadibg to make impact evaluation a problem is the unidimensional thinking in development: development has only one goal that can be measured as a single figure.

    Firstly goals like child mortality, economic growth and democracy are different, it means you will have to assign a priority and budget to them separately, because a choice and priority setting on them can only be political (design a technocratic formula to decide how many kids can die for one election?).

    This means that within your development, you must assign for each separate goal a separate budgets. In each category, you go afterwards for the biggest impact.

    Secondly, donors have at least 2 types of budgets: the main part is the “bulk money”: moving huge amounts of money mainly to reach the overall goal of 0.7 %. This money, going often to budget aid, world bank, etc. should be strictly spend on proven initiatives. Secondly the project money should be partially reserved for risk capital, where impact evaluations are rather seen as a learning tool than as a judgment on the choice of project.

    Posted July 20, 2010 at 2:23 am | Permalink
  3. Robert Tulip wrote:

    Randomised Control Trials are good at evaluating impact of specific interventions in the health sector. In assessing priorities and results in the broader chaos of development, it is open to doubt whether health provides a model for other sectors. Many health activities are intrinsically unsustainable and depend on donor inputs. By comparison, market-oriented reforms can provide the resources to pay for social programs, but are harder to subject to RCT. The project model of RCT assumes and reinforces a charity paradigm for aid.

    Posted July 20, 2010 at 2:28 am | Permalink
  4. “And how meaningful is impact evaluation anyway when you consider the short time frames we’re working with? Most development programs take ten years or more to show real impact.”

    In all the impact measurement debate, I have never understood why this isn’t a more prominent argument. With microfinance specifically, most studies occur over a two-year period, at most. What program in the history of economic development has ever produced really meaningful results in terms of poverty alleviation in two years? I would say that 10 years is an underestimate. These things seem to be generational. The parents only marginally benefit from these programs. The kids are the beneficiaries of improvements in education, which is the result of improvements in other things, like infrastructure and the economy. Thirty years down the road, hopefully the landscape of the community has changed altogether.

    Posted July 20, 2010 at 4:04 am | Permalink
  5. Curious wrote:

    I think recalling one of the main arguments from White Man’s Burden – we can’t successfully social engineer democracy (or a ‘human rights respecting culture’) so instead of wasting money and frustrating everyone and making them sweat trying to make logframes and plans on how to change the un-engineerable, let’s just focus on doing the things that are do-able – and health services are a good example (while even these are not straightforward, but still).

    So I don’t think it’s fair to conflate “avoiding the difficult” with “recognizing our limits as aid”.

    I would like nothing more than to be free of the bondage of the Big Plans.

    Posted July 20, 2010 at 11:23 am | Permalink
  6. Using careful impact measurement where it is feasible is not the same thing as “limit[ing] all of our development projects to those that have easy metrics for success”. Just because all development interventions can’t be carefully evaluated does not mean that there should be no careful evaluation for any of them–which seems to be the view of many of those who are against the very idea of rigorous evaluation.

    While it is impossible to measure and attribute all of the effects of an intervention in a complex system, it is most certainly possible to measure and attribute effects in many settings. The Millennium Village Project, for example, stated from the beginning that it sought to demonstrate that a five-year package of intensive interventions could put an individual cluster of villages on “the ladder of development”. While all imaginable effects of the intervention in a complex system cannot be evaluated, that claim can certainly be evaluated.

    This requires making a precise definition of what being on the “ladder of development” means (sustained growth in incomes without further outside assistance would be one candidate definition). It requires setting up a clearly defined, ex-ante chosen set of controls to compare to those that got the intervention. In short, it requires precision and transparency. It is feasible and it is not reductionist; it is nothing more than a transparent assessment of factual claims.

    Blanket assessments that all rigorous evaluation is too reductionist to capture the complexities of the world are inimical to transparency. The alternative to evaluating what we can, when we can, is simply to trust organizations to assess themselves; all organizations have great difficulty doing that objectively. Yes there are settings where careful evaluation is difficult. There are many where it is not. Problems of accountability in aid organizations, however, are present always and everywhere. Rigorous evaluation should be used wherever possible to promote transparency.

    Posted July 20, 2010 at 11:32 am | Permalink
  7. You are right to point out the convergence in some of the critiques of impact measurement. The quote from my blog (the post was by Steve Lawry, though – not me!) speaks to how the increasing pressure to measure results pushes organizations to do work that is easy to measure (rather than work that is innovative and important, but risky). Andrew Natsios’ essay points out that WHY measurement takes place is also important. It’s clear that measuring mostly for the sake of compliance can be wasteful and distracting – within a large bureaucracy this turns into an exercise of feeding the beast rather than learning what works in order to improve development practice.

    In this pushback on the “pressure to measure”, it sounds like the options are that you either obsessively measure everything in sight or you leave everything upto the judgment of implementers who know success when they see it. Thoughtful development practitioners, of course, are searching for that lovely middle ground where metrics are meaningful in relation to the complexity/timeframes of the interventions, evaluation methods allow for real reflection and learning, and accountability is to poor communities as well as to donors. There’s a lot of good evaluation that has moved development practice forward.

    Donors’ desire to have evidence of results is understandable, but the unintended consequences of the insistence on “proof” need to be better understood.

    Posted July 20, 2010 at 4:28 pm | Permalink
  8. Laura wrote:

    Indeed sometimes it does seem all so pointless, especially if the impact measurement process is donor driving… I’m involved in two different but comprehensive planning processes with the UN. the main challenge, even before we begin to identify which outcomes to measure, which impacts to use, is to try and coordinate amongst each other. Luckily in one of the process this is where we are focusing: identifying points where we think joint efforts can have a stronger contribution, and we are trimming the planning process…however at the cost of ‘participation’…not good… aaah, it’s a double-edged sword!

    Posted July 21, 2010 at 1:03 am | Permalink
  9. John Coonrod wrote:

    Impact evaluation – like development – should be bottom up. The biggest evaluation gap is that impoverished communities lack basic information about their own situation: nutrition rates, drop-out rates, crop yields etc. With this information, an organized community will take action. NGOs can provide capacity building and advocacy for communities to be able to fulfill this basic right to information – and then “impact evaluation” will be a natural and integrated process.

    Posted July 21, 2010 at 8:44 am | Permalink
  10. nanaco wrote:

    I agree with someone saying that even 10years is an underestimate and it will take more than 30years to see real impacts. However, I am afraid it is difficult to show the causal relationship between the interventions and the impacts after 30 years since there should be so many different interventions in the meantime. That said, in terms of development evaluation, will it be better to focus on mid term outputs or outcomes rather than long-term social impacts?

    Posted July 21, 2010 at 9:18 am | Permalink
  11. I think you should be careful not to throw the baby out with the bathwater here. At the same time we should recognize that not all things can be easily measured, and that not all things have impact on a short time horizon, data driven analysis is important for accountability and efficient resource allocation.

    My take is that, frankly, economists may not be in the best position to inform this debate. Publishing an academic paper requires one to prove causality, which in turn requires a great degree of rigor. I do not think this level of rigor is necessary nor appropriate when it comes to development practitioners. Not only are RCTs expensive, but they give a false sense of precision — their results are conditional on the preferences of their control and treatment groups (which may not be static), the attributes of the culture in which the experiment is conducted, the population of interest (which may be very different for academia vs. practitioners), and most importantly — the implementing organization.

    One thing I’ve found illuminating is to compare this to the private sector. Business leaders make decisions based on data — not just profits but a host of other important metrics that indicate performance. This is data-driven and measures impact, and is a much more realistic mechanism to ensure accountability than demanding rigorous evaluation at every turn.

    I’m not saying that an RCT isn’t valuable in some instances. But I do think there’s an overemphasis on evaluation at a level of rigor that’s totally inappropriate.

    Posted July 21, 2010 at 9:40 am | Permalink
  12. I agree with the “don’t throw the baby out with the bathwater” group: despite issues of duration of projects, and measurability of human endeavors there are still many things that can be done — and that can and should be measured. And I believe those are the ones the aid community should focus on. It is relatively simple to measure “what percentage of households have access to clean water” or “what percentage of children have received vaccination” (if by relatively simple we recognize the months of hard work involved) — and therefore relatively simple to know if our efforts to increase clean water access or vaccination are working.

    In part due to the efforts of the Measles Initiative, measles deaths worldwide fell by 78 percent between 2000 and 2008, from an estimated 750,000 to 164,000. One of the reasons for that success has been that the MI focused not only on vaccine logistics and the “cold chain (a critical aspect), and not only on building a real coalition of partners (ministries, ARC, UNF, CDC, WHO) but ALSO on real evaluation of whether vaccination was successful at reaching a high percentage of eligible children (ie, measuring coverage).

    Additionally, the Measles Initiative was a pioneer in recognizing the value of collecting coverage data electronically, first on Palm PDAs and now on mobile phones, which saves months at least of data entry and speeds up analysis and system feedback. Despite their success, however, the great majority of evaluation and other field data is still collected on paper — almost 20 years after the introduction of the PDA: an astonishing failure to utilize useful and readily available technology. That’s as if people were still using paper spreadsheets 20 years after the invention of the electronic ones!

    To address this issue, the UNF moved beyond actually vaccinating kids against measles, to fund the project to make the same mobile electronic data collection they were using available to ANY program that needs it, at low or no cost.

    In part, the cure for the frustration that comes across in many posts here about un-evaluable projects and unreached goals is to focus on goals that can plausibly be achieved and that can actually be measured (NOT the “we will eliminate inequality and/or poverty in five or ten years” nonsense) and to use appropriate ICT tools that will enable us to more rapidly and effectively measure them.

    Posted July 21, 2010 at 10:29 am | Permalink
  13. david phillips wrote:

    Whether development assistance projects in the poorest countries succeed or fail does not necessarily have anything to do with the quality of the design or the inputs. It has as much to do with systemic factors such as the power relations between donors and recipients. Systemic factors cause systemic failure. This is the reason why assistance has so often failed in the poorest countries. Impact evaluation will not pick up systemic problems. Furthermore even if IE does manage to find causes it faces major problems of generalizability and repliability. After a few years the craze will be over and the donor community will move their dollars to some other place.

    Posted July 22, 2010 at 4:58 pm | Permalink
  14. david phillips wrote:

    The major weakness with IE is that it is irrelevant. The reason why the poorest 1 billion people remain poor desipte large amounts of aid is systemic – to do with the power relations of donor and recipient. It is not to do with the quality or design of assistance. Systemic failure has systemic causes. IE can show whether something succeeded or failed but it cannot identify systemic causes. Even if it does pick up causal factors the results may face major problems of generalizability and replicability which render them no more than case studies sui generis – useful for academic courses but nothing much else. After a few years the IE craze will fizzle out and the donor community will look for some other silver bullet, equally pointless. This may sound cynical but it isnt, Its factual!

    Posted July 22, 2010 at 6:01 pm | Permalink
  15. As I see this, there are some projects for which it is possible to measure impact but many for which it is difficult, even in more developed countries, to measure. Infrastructure projects, whether physical or social, lend themselves for impact measurement realizing that the impacts can at best only be projected initially while the result would take time to measure. This does not, however, allow anyone the luxury to wait for better information, sometimes one has to accept that uncertainty is a fact of life especially when working with large projects.

    Where one has a problem is to quantify the impacts of “soft” projects such as, as being mentioned, rule of law, democracy or poverty alleviation. I do not think that we will be able to measure the impacts in the sense that development or project economists would like to have. Thus my suggestion will be to allow aid agencies experiment with these “soft” projects while the development and/or project economists continue with their processes of project evaluation.

    Posted July 23, 2010 at 12:59 pm | Permalink
  16. Mark Skeith wrote:

    Working with MCC in Nicaragua Sherine Jaywickarama’s second arguement is definitely true here. Check my Nica Tan Rica blog for details.

    Posted July 23, 2010 at 5:26 pm | Permalink
  17. Solar_Sister wrote:

    I am goal oriented, number crunching, model loving, and analytical. I could easily fall in love with Impact Measurement since it feeds all of my number crunching cravings. But ultimately, the numbers don’t give the whole story. They provide models of reality, but not reality itself. They are at best breadcrumbs. Notice them and consider their guidance where they are available, but there is so much information that they miss. They are no substitute for common sense, compassion and clear thinking. You should be highly suspicious of any model that gives you an answer you didn’t already know based on your back of the envelope calculations. Precision of data is not the same as accuracy. So much rests on the assumptions. Carrying out the answer to the 5th decimal place doesn’t fix erroneous assumptions. It is the same with Impact Measurement. We tend to measure what we ‘can’ measure. But that does not equate to truly measuring our total impact.

    As a smart man once said: “”Not everything that counts can be measured. Not everything that can be measured counts..” — Albert Einsten.

    Posted July 24, 2010 at 7:33 am | Permalink
  18. Stecve wrote:

    Evaluations, like all logistics, are wonderful tools, but really lousy counselors. The human element is the key…. The tendencies, even latent, to exploit the recipients and manipulate the donors by those who facilitate the aid and the evaluations, must be acknowledged.

    David Phillips; you ARE being cynical, and cynicism is often a result of a clear understanding of the facts!

    Posted July 24, 2010 at 4:19 pm | Permalink
  19. Brad wrote:

    Nice comment, Solar_Sister.

    Sometimes when evaluation is argued against — as in “complex” activities like rule of law — the activities are compared against supposedly measurable activities like health and water supply. But practitioners in those sectors know that the standard impact measurements, like CMR or L/p/d, are actually very thin. Water supply doesn’t matter if it isn’t clean and if food and waste isn’t handled with good hygiene practices, etc., and these are very complex, hard-to-measure and hard-to-change issues. In other words, anyone who thinks that water supply is a “simple” or “complicated” system hasn’t ever worked on a water supply program. Yet people in that sector continue measuring impact. Rather than stop measuring, and rather than trying to measure everything, professionals use the indicators for what they are — useful straws in the wind — without letting themselves think myopically that the indicator is all that matters. It seems that this kind of approach (the kind of approach used in supposedly measurement-friendly sectors) should inform impact measurement elsewhere.

    Posted July 26, 2010 at 5:28 am | Permalink
  20. Abiy wrote:

    It is exciting to read this interesting conversation. I guess to change the way the world is doing now we have to also re-orient ourselves and change the way we judge success. That may start from redefining the concept VALUE in Evaluation. There are many issues standing between two consecutive digits that are important for communities but can’t fit our quantitative box and get counted.

    Posted July 26, 2010 at 11:16 am | Permalink
  21. Lois-ellin Datta wrote:

    Discussions of the benefits and limitations of impact measurement might distinguish between measurement and design. One needs some indication of what is happening—the measures, be they process, immediate, outcome or impact. One needs some information about why whatever is observed is happening—the evaluation design, be this RCT, quasi-experimental, systems-theory framework, case study,mixed, or any other of the many designs, each appropriate for different contexts and settings. Alanna’s useful and provocative comments perhaps could lead to an appreciation for better (more diverse, practical, sensitive) measurement as well as for greater awareness of the many appropriate evaluation designs. Otherwise, one may find oneself once again in anecdoteville and intuitionland.

    Posted July 28, 2010 at 1:56 pm | Permalink
  22. The fundamental limitation of traditional evaluation is that it is not scalable. While I agree with Michael and others that we should do it when possible, and appropriate, we need other (decentralized, distributed, and implicit) mechanisms for judging whether a project is successful:

    Posted July 29, 2010 at 7:36 am | Permalink

9 Trackbacks

  1. […] This post was mentioned on Twitter by, Conduit Journal. Conduit Journal said: Is Impact Measurement a Dead End? […]

  2. […] Yesterday I read a terribly thought-provoking post by guest blogger Alanna Shaikh at AidWatch.  She raises some interesting questions about the utility of impact measurement given the limited […]

  3. […] “Is Impact Measurement a Dead End?” by Alanna Shaikh, guest blogging at AidWatch […]

  4. […] Shaikh explores the limitations and dangers of the aid industry’s obsession with assessing impact and shows how crowdsourcing can work in practice, as she raises the funds for a violence-reducing […]

  5. By Blessed are the peacemakers « Find What Works on July 23, 2010 at 10:29 am

    […] Go to comments Alanna Shaikh wrote an excellent guest post on AidWatch earlier in the week: Is Impact Measurement a Dead End? Her main critique focuses on the problem of quantifying the unquantifiable, and the incentive […]

  6. By The burden of proof « Aid Thoughts on July 23, 2010 at 7:28 pm

    […] at Aidwatch, Alanna Shaikh, citing a few othes, considers the limits of impact analysis. At one point she cites a post by Steven Lawry: “Many real-world problems are not easily […]

  7. […] Steven Lawry at the Hauser Center adds his two cents, arguing that our emphasis on metrics restricts the ability of non-profit managers to draw on alternative sources of knowledge, such as “formal studies, observation of trends in behavior, and feedback from partners and clients.” When funding is contingent upon projects being validated by clear metrics, Lawry argues, NGOs are forced to avoid risky and/or long-term projects that might not show results within the short life-span of an impact assessment. Alanna Shaikh, writing for Aidwatch, seconds this point: “Most development programs take ten years…” […]

  8. […] Affleck > Bono when it comes to aid. Meanwhile, Alanna Shaikh  at Aid Watch writes about the aid community’s affinity for impact evaluation and how it skews funding and programming away from  rule of law/government -type programs. Her […]

  9. […] Pierre Louis’s Beta/VHS analogy, the best standard won’t necessarily be the victor.  Indeed, if programs that are easier to measure attract more funding, this might suggest a bias toward “user-friendly” standards.  This isn’t […]

  • About Aid Watch

    The Aid Watch blog is a project of New York University's Development Research Institute (DRI). This blog is principally written by William Easterly, author of "The Elusive Quest for Growth: Economists' Adventures and Misadventures in the Tropics" and "The White Man's Burden: Why the West's Efforts to Aid the Rest Have Done So Much Ill and So Little Good," and Professor of Economics at NYU. It is co-written by Laura Freschi and by occasional guest bloggers. Our work is based on the idea that more aid will reach the poor the more people are watching aid.

    "Conscience is the inner voice that warns us somebody may be looking." - H.L. Mencken

  • Archives