Skip to content

Finally, the definitive guide to creatively manufacturing your own research result

From the brilliant xkcd (also the creator of this classic in statistics humor).

We couldn’t resist using this as a way to illustrate some of our early wonky posts complaining about the suspected practice of “data mining” in aid research.

In aid world, research looks for an association of some type between two factors, like economic growth and foreign aid. But since both growth and aid contain some random variation, there is always the possibility that an association appears by pure chance.

“p < .05” is our assurance from the researchers that the probability that their result came about by coincidence is less than 1 in 20, or 5 percent, which is the accepted standard.

But the aid researchers—like the jelly bean scientists—are eager to find a result, so they may run many different tests. The problem, as Bill explained it, is that:

The 1 in 20 safeguard only applies if you only did ONE regression. What if you did 20 regressions? Even if there is no relationship between growth and aid whatsoever, on average you will get one “significant result” out of 20 by design. Suppose you only report the one significant result and don’t mention the other 19 unsuccessful attempts.…In aid research, the aid variable has been tried, among other ways, as aid per capita, logarithm of aid per capita, aid/GDP, logarithm of aid/GDP, aid/GDP squared, [log(aid/GDP) - aid loan repayments], aid/GDP*[average of indexes of budget deficit/GDP, inflation, and free trade], aid/GDP squared *[average of indexes of budget deficit/GDP, inflation, and free trade], aid/GDP*[ quality of institutions], etc. Time periods have varied from averages over 24 years to 12 years to to 8 years to 4 years. The list of possible control variables is endless….So it’s not so hard to run many different aid and growth regressions and report only the one that is “significant.”

And the next thing you know, there’s a worldwide boycott of green jelly beans…

UPDATE by Bill 12 noon: I asked around some journalist contacts of Aid Watch at leading newspapers how much awareness of this problem there is in the media, and got a fairly clear answer of ZERO.

Be Sociable, Share!
This entry was posted in Academic research, Data and statistics. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.

10 Comments

  1. Ben Taylor wrote:

    I certainly won’t be eating green jelly beans again.

    Posted April 8, 2011 at 3:00 am | Permalink
  2. Dan Kyba wrote:

    scientific experiments cause cancer in rats

    Posted April 8, 2011 at 10:00 am | Permalink
  3. Jacob AG wrote:

    The same goes for aid skeptics, does it not?

    I smelled something very fishy at the DRI conference, for example, during Prof. Easterly’s talk. He seemed to be saying that the funnel-shaped correlation between growth and freedom was the result of a causation running from autocracy to volatility, without any evidence of that in the data (just a correlation… the qualitative arguments in the rest of the talk, by the way, were much more convincing). He corrected me during the Q&A session, but I still think a lot of people in the audience walked away thinking (apparently incorrectly) that Bill was saying the data showed that autocracy turns a country into “Las Vegas.”

    Anyway, this is why you two are doing a series of blog posts on the question of causality vis-a-vis growth and autocracy, which makes me very happy. Can’t wait for the next one.

    Posted April 8, 2011 at 10:02 am | Permalink
  4. Peter Davis wrote:

    Great Cartoon!
    This is a big problem.
    Not many people really understand statistics.

    Another problem is that people confused statistically significant with meaningfully significant.

    For example if Green jellybeans caused a 0.0000001% increase in cancer, no one should care, but if you test a large enough sample you can expect to find a statistical difference from 0.

    Posted April 8, 2011 at 11:19 am | Permalink
  5. Vivek Nemana wrote:

    Pursuing a career in journalism causes a misunderstanding of basic statistics (p < 0.05).

    Posted April 8, 2011 at 12:23 pm | Permalink
  6. Matt wrote:

    Peter: YES!

    Significance doesn’t always = importance

    Posted April 8, 2011 at 12:45 pm | Permalink
  7. Vivek Nemana wrote:

    @Peter and @Matt,

    I might be biased, but I personally think they should start teaching at least a year of mandatory economics principles and statistics in public schools. It’s mind-boggling to see how many people don’t know a lot of those pretty fundamental ideas, just because it’s never been presented to them in an accessible manner.

    Then again, a true economist might argue that individual agents seek out the information they need to go about their personal lives.

    Okay I’m definitely biased.

    Posted April 8, 2011 at 1:36 pm | Permalink
  8. Dan Kyba wrote:

    @ Vivek

    Totally agree; in the long run the most important benefit of studying stats is the logical discipline that comes with understanding Type I and Type II error.
    Considering what a PR saturated society we have become, that discipline is more necessary than ever as an antidote.

    Posted April 8, 2011 at 1:57 pm | Permalink
  9. Jacob AG wrote:

    Vivek,

    I couldn’t possibly agree more. When I first saw a supply and demand graph as a freshman in college, I slapped my forehead and said “WOW, why have I never seen this before?!”

    Public school had failed me… but it’s okay now, I went to a nice expensive private university, majored in economics, and now I know all about econ and stats, right?

    Yeah, we’re all biased.

    Posted April 8, 2011 at 2:31 pm | Permalink
  10. Robert Tulip wrote:

    The green jellybean finding would only be significant if the researcher had a prior hypothesis of a causal link. Absent a theoretical basis it is irrelevant data mining.

    Posted April 9, 2011 at 3:39 am | Permalink

One Trackback

  1. [...] rest is here: Finally, the definitive guide to creatively manufacturing your own research… AKPC_IDS += [...]

  • About Aid Watch

    The Aid Watch blog is a project of New York University's Development Research Institute (DRI). This blog is principally written by William Easterly, author of "The Elusive Quest for Growth: Economists' Adventures and Misadventures in the Tropics" and "The White Man's Burden: Why the West's Efforts to Aid the Rest Have Done So Much Ill and So Little Good," and Professor of Economics at NYU. It is co-written by Laura Freschi and by occasional guest bloggers. Our work is based on the idea that more aid will reach the poor the more people are watching aid.

    "Conscience is the inner voice that warns us somebody may be looking." - H.L. Mencken

  • Recent Comments

  • Archives