Skip to content

Maybe we should put rats in charge of foreign aid research

Rat.jpg

Laboratory experiments show that rats outperform humans in interpreting data, which is why we have today the US aid agency, the Millennium Challenge Corporation. Wait, I am getting ahead of myself, let me explain.

The amazing finding on rats is described in an equally amazing book by Leonard Mlodinow. The experiment consists of drawing green and red balls at random, with the probabilities rigged so that greens occur 75 percent of the time. The subject is asked to watch for a while and then predict whether the next ball will be green or red. The rats followed the optimal strategy of always predicting green (I am a little unclear how the rats communicated, but never mind). But the human subjects did not always predict green, they usually want to do better and predict when red will come up too, engaging in reasoning like “after three straight greens, we are due for a red.” As Mlodinow says, “humans usually try to guess the pattern, and in the process we allow ourselves to be outperformed by a rat.”

Unfortunately, spurious patterns show up in some important real world settings, like research on the effect of foreign aid on growth. Without going into any unnecessary technical detail, research looks for an association between economic growth and some measure of foreign aid, controlling for other likely determinants of economic growth. Of course, since there is some random variation in both growth and aid, there is always the possibility that an association appears by pure chance. The usual statistical procedures are designed to keep this possibility small. The convention is that we believe a result if there is only a 1 in 20 chance that the result arose at random. So if a researcher does a study that finds a positive effect of aid on growth and it passes this “1 in 20” test (referred to as a “statistically significant” result), we are fine, right?

Alas, not so fast. A researcher is very eager to find a result, and such eagerness usually involves running many statistical exercises (known as “regressions”). But the 1 in 20 safeguard only applies if you only did ONE regression. What if you did 20 regressions? Even if there is no relationship between growth and aid whatsoever, on average you will get one “significant result” out of 20 by design. Suppose you only report the one significant result and don’t mention the other 19 unsuccessful attempts. You can do twenty different regressions by varying the definition of aid, the time periods, and the control variables. In aid research, the aid variable has been tried, among other ways, as aid per capita, logarithm of aid per capita, aid/GDP, logarithm of aid/GDP, aid/GDP squared, [log(aid/GDP) – aid loan repayments], aid/GDP*[average of indexes of budget deficit/GDP, inflation, and free trade], aid/GDP squared *[average of indexes of budget deficit/GDP, inflation, and free trade], aid/GDP*[ quality of institutions], etc. Time periods have varied from averages over 24 years to 12 years to to 8 years to 4 years. The list of possible control variables is endless. One of the most exotic I ever saw was: the probability that two individuals in a country belonged to different ethnic groups TIMES the number of political assassinations in that country. So it’s not so hard to run many different aid and growth regressions and report only the one that is “significant.”

This practice is known as “data mining.” It is NOT acceptable practice, but this is very hard to enforce since nobody is watching when a researcher runs multiple regressions. It is seldom intentional dishonesty by the researcher. Because of our non-rat-like propensity to see patterns everywhere, it is easy for researchers to convince themselves that the failed exercises were just done incorrectly, and that they finally found the “real result” when they get the “significant” one. Even more insidious, the 20 regressions could be spread across 20 different researchers. Each of these obediently does only one pre-specified regression, 19 of whom do not publish a paper since they had no significant results, but the 20th one does publish their spuriously “significant” finding (this is known as “publication bias.”)

But don’t give up on all damned lies and statistics, there ARE ways to catch data mining. A “significant result” that is really spurious will only hold in the original data sample, with the original time periods, with the original specification. If new data becomes available as time passes you can test the result with the new data, where it will vanish if it was spurious “data mining”. You can also try different time periods, or slightly different but equally plausible definitions of aid and the control variables.

So a few years ago, some World Bank research found that “aid works {raises economic growth} in a good policy environment.” This study got published in a premier journal, got huge publicity, and eventually led President George W. Bush (in his only known use of econometric research) to create the Millennium Challenge Corporation, which he set up precisely to direct aid to countries with “good policy environments.”

Unfortunately, this result later turned out to fail the data mining tests. Subsequent published studies found that it failed the “new data” test, the different time periods test, and the slightly different specifications test.

The original result that “aid works in a good policy environment” was a spurious association. Of course, the MCC is still operating, it may be good or bad for other reasons.

Moral of the story: beware of these kinds of statistical “results” that are used to determine aid policy! Unfortunately, the media and policy community don’t really get this, and they take the original studies at face value (not only on aid and growth, but also in stuff on determinants of civil war, fixing failed states, peacekeeping, democracy, etc., etc.) At the very least, make sure the finding is replicated by other researchers and passes the “data mining” tests.

In other news, anti-gay topless Christian Miss California could be a possible candidate for a new STD prevention campaign telling all right-wing values advocates: “abstain, or the left-wing media will catch you.”

Be Sociable, Share!
This entry was posted in Data and statistics. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.

15 Comments

  1. Ben wrote:

    It’s called a “Data Dredge”.

    Data mining, put simply, is if you are doing it on purpose, and you understand the implications in terms of false positives.

    Posted May 8, 2009 at 7:58 am | Permalink
  2. Alanna wrote:

    But the MCC’s marketing materials are so slick! They must know what they’re talking about!

    Posted May 9, 2009 at 12:31 pm | Permalink
  3. Steve Roth wrote:

    Dear Mr. Easterly:

    re: different time periods test

    I am only autodidactically schooled in statistics (read: poorly), but this period/lag issue troubles me constantly.

    In a straightforward correlation/scatterplot analysis–looking at two variables for multiple countries over a lengthy period (which strikes me as about the closest thing we can get to a natural experiment)–the researcher has to choose the times/time periods for each of the variables. There are often dozens of (combined) choices available, depending on the underlying time series.

    But I almost never see a study that shows results from more than a small set of time periods, or lags. Most often only one. It strikes me that such a matrix provides the most immediately apprehensible portrayal of the presumed effect (or lack of same).

    Others employ what seems to be statistical legerdemain to impute causation (i.e. 2SLS without reversing the lags in the equation or incorporating multiple lags), without considering that subsequence–and the lag for that subsequence–is the most elemental basis for inferring or disproving causation.

    I’d like to see a lot more correlation results in a form similar to the last table here:

    http://www.asymptosis.com/wealth-equality-and-prosperity.html

    Or displayed as in the table here:

    http://www.asymptosis.com/europe-vs-us-who%e2%80%99s-winning.html

    I’m essentially suggesting that simpler correlations (which still might include multiple regressions, for instance) showing multiple periods/lags in this manner may feed human judgment better than more complex statistical methods that do not fully embrace these temporal choices–especially given those methods’ necessary, often abstruse, often ill-described, and hence often-questionable associated models.

    Does this make sense?

    Posted May 9, 2009 at 4:08 pm | Permalink
  4. Simon wrote:

    I understand that ‘Data Mining’ is a problem. If you are trying, however, to isolate a regression for which you have normally distributed residuals then is it ‘data mining’ to report only those regressions for which you have normally distributed residuals and not those for which you don’t? I’m thinking along the lines of Hendry and others here, though I could be completely off track.

    Posted May 9, 2009 at 4:53 pm | Permalink
  5. Anonymous wrote:

    This practice is known as “data mining.”

    No it isn’t.

    Posted May 9, 2009 at 5:26 pm | Permalink
  6. Mark Palko wrote:

    There are more mistakes here than I have time to address at the moment (I hope to have a detail post up soon at DWAR), but here are a couple of quick points:

    Even given the rather loose definition of data mining, your example is still wrong.

    1. The methods you describe for cooking data predate the term ‘data mining’ by decades. They were even common enough to be mentioned in “How to Lie with Statistics.”

    2. Data mining techniques include steps to check against overfitting based on data sets withheld from the model building process.

    As I said before, this is a loosely defined term but you have gone way outside of standard usage here.

    Mark Palko

    Posted May 9, 2009 at 6:57 pm | Permalink
  7. Javed Alam wrote:

    and Chimpanzees beat human in short memory tasks http://bit.ly/Nltpx

    We are losing out to animal kingdom.

    Posted May 10, 2009 at 12:23 am | Permalink
  8. I think it is wise to realize that we are not the end in a “chain of being” we are simply the end product on one of many branches of evolution. Every organism alive today is equally advanced as we are. And each organism is suited to survive in its niche. So, it is no surprise that each organism will out perform us in its niche.

    Don’t be distracted by the “data mining” red herring.

    Posted May 10, 2009 at 1:02 am | Permalink
  9. Alex F wrote:

    Wow, weird commenters today. I’d never heard the term “data dredging” before, but here’s Wikipedia on it:

    >>Data dredging (data fishing, data snooping) is the inappropriate (sometimes deliberately so) search for ‘statistically significant’ relationships in large quantities of data. This activity was formerly known in the statistical community as data mining, but that term is now in widespread use with an essentially positive meaning, so the pejorative term data dredging is now used instead.

    So… William Easterly is using the term “data mining” in the common sense of the term that everyone has heard and understands, and some computer scientists / machine learning guys are dropping by to tell us that they invented a new word for that since they wanted to appropriate “data mining” for themselves. OK. Thanks guys. (Now you can go off and edit the Wikipedia page so that it says that “data mining” was never used for this, and that only ignoramuses have ever used that term when they meant “data dredging”.)

    Posted May 10, 2009 at 8:28 pm | Permalink
  10. Steve Roth wrote:

    I’m kind of dismayed to see people quibbling about terminology rather than discussing the best ways to feed our human judgments with statistical methods and presentations that more accurately (or usefully) represent reality.

    I’m assuming that others (like I) have some issues with Easterly’s positions on aid and development overall. But still. Spend your time talking about what matters.

    Posted May 11, 2009 at 9:38 am | Permalink
  11. Sceptical Secondo wrote:

    Data mining, dredging, jibbering … is merely a symptom of the key problem with a widespread misconceptualisation and use of statistics in the social sciences: the identification of timeless effects of isolated factors.

    Posted May 11, 2009 at 12:24 pm | Permalink
  12. Andrew Duguay wrote:

    My econometrics professor at Gordon college drilled into us the importance of avoiding data mining. For my final paper,

    found here…

    http://andrewduguay.xanga.com/weblog/

    I used one of the tactics of proof of validating a regression by attempting to update a study done by Paul Collier back in 2001. Using ethnic fractionalization data produced from William Easterly and Alberto Alesina I ran regressions for economic growth with democracy and diversity as right side variables. Interestingly enough, I found his results didn’t quite stand alone when using more up-to-date data in Africa. However, taking into account trade made the other variables relavant.

    I may not be a master econometrician, but I agree with Easterly that an important part of validating research is to test it over time.

    Posted May 12, 2009 at 6:46 am | Permalink
  13. Doug Johnson wrote:

    All good points, but I think you overstate the case that the primary motivation for the creation of the MCC was the Burnside and Dollar result. The other main argument for the MCC was that it would, potentially, create a set of incentives for countries to pursue good policies. I don’t claim to know what went on inside former president Bush’s and his advisors’ heads when they decided to create the MCC, but if you take their statements at face value, the potential incentive effect played a much larger role in their thinking. (Check out Bush’s original speech announcing the creation of the MCC — the word “reward” is used multiple times while the Burnside and Dollar result is only alluded to in passing.)

    Not that the incentive argument is that compelling either, but then again, you have to allocate aid on the basis of something, right? At least adhering to a strict set of rules reduces the degree to which aid is allocated on the basis of foreign policy priorities which, I think, is a good thing.

    Posted May 16, 2009 at 12:56 am | Permalink
  14. Amanda wrote:

    two recent developments (links below) related to mcc and your post above, and also perhaps to your agoa post. mcc is considering stopping aid in nicaragua and even stopped aid in madagascar based on ‘good environment’ criteria it set at the outset. (or mada could be related to the same reasons that agoa might be cut, but we wont know that.) while i think mcc should be applauded for considering stopping aid programming based on objective criteria, too bad this criteria is related to the ‘good policy environment’ evidence described above rather than whether the program is working. are you planning a post on good stats or stories about stopping programs because they didnt work?

    http://www.csmonitor.com/2009/0609/p06s10-woam.html

    http://www.mcc.gov/press/releases/documents/release-051909-mccboardauthorizes.php

    Posted June 10, 2009 at 10:24 am | Permalink
  15. Ronan L wrote:

    “this is very hard to enforce since nobody is watching when a researcher runs multiple regressions”

    Perhaps one day, when life is almost entirely online, this will no longer be an issue. All regressions done by a researcher or a research time would be done through cloud computing, so there would be a record, open for all to see.

    Alternatively, what about ethical regressing, in line with ethical investing? Researchers could voluntarily keep a log, open for anyone to examine.

    Or what about opensource research? A researcher would make their regression code available for anyone to use and improve…

    Posted July 8, 2009 at 12:54 pm | Permalink
  • About Aid Watch

    The Aid Watch blog is a project of New York University's Development Research Institute (DRI). This blog is principally written by William Easterly, author of "The Elusive Quest for Growth: Economists' Adventures and Misadventures in the Tropics" and "The White Man's Burden: Why the West's Efforts to Aid the Rest Have Done So Much Ill and So Little Good," and Professor of Economics at NYU. It is co-written by Laura Freschi and by occasional guest bloggers. Our work is based on the idea that more aid will reach the poor the more people are watching aid.

    "Conscience is the inner voice that warns us somebody may be looking." - H.L. Mencken

  • Recent Comments

  • Archives