
With a previous post on data mining, let’s examine one recent book as a possible candidate for tests of whether data mining could be a problem. Here are the top 10 reasons I chose this book:
10. Oodles of regressions were run
Author each morning
wondering whether, during the previous evening, Pedro, or Anke, or Dominic, or Lisa, or Benedikt, or Marguerite has cracked whatever problem we had crashed into by the time I left for home.
9. Oodles of control variables were tried
…range of possible causes drawn from across the social sciences. In addition to various characteristics of the economy, these include aspects of the country’s history, its geography, its social composition, and its polity.
8. Weird conclusions about war
mountains are dangerous…
7. Sample was sliced up to get results
Globally we find no effect of ethnic polarization. But in Africa ethnic polarization sharply increases risk.
6. Very flexible specifications to get results
If coup risk is high, military spending reduces risk…if coup risk is low, military spending increases risk…
5. Previous results with same methodology didn’t pass the new data test
Our previous results got overturned by the new data.
4. Reverse causality makes every interpretation questionable
I’ll let Nathan Fiala handle this one.
3. Overconfidence in such statistical research as definitive
The ideas in this book are all founded on statistical research.
2. Author previously announced he was data mining:
Table 1 presents the preferred reference model of conflict duration with eight variations. The reference model is reached after a series of iterations in which insignificant variables are deleted and variants of economic, social, geographic and historical explanatory variables are then tested in turn.
1. A lot depends on the results
The book often won’t let Africans vote, but it will let them experience Western military intervention.




13 Comments
And when amoral economists come out with such inane conclusions as “democracy is bad for development” (at a time when the African Union finally acts with unity against unconstitutional transfer of power) it really is time to close the book.
to Bill (Anderson):
Did you read the book? I wouldn’t call the conclusions of GWV “amoral” and they certainly don’t support the claim that “democracy is bad for development.”
The main claim is that *empty* democracy is bad for development. Empty democracy is typically characterized by dodgy elections, a severe lack of implicit accountability, power-sharing agreements, and electorates relying heavily on ethnic identification.
to Bill (Easterly):
On the very very last point about data mining: dropping insignificant variables isn’t data mining, unless (1) your main result heavily depends on those variables and (2) there isn’t a good reason for excluding them anyway -
Equally: rotating various historical and current proxies of economic indicators to find those that are significant isn’t data mining, unless your end result depends precisely on the make-up of that configuration.
Strong results should be strong in the face of many specifications, but cycling through a variety of specifications, in itself, is not data mining.
The Third Iron Law
William Easterly takes on Paul Collier (and implicitly many other users of regression techniques). Oodles of regressions were run…Oodles of control variables were tried…Sample was sliced up to get results… Read the whole thing. My father used to …
Mountains are dangerous. They wash out some of the advantages traditional armies hold over insurgent/rebel types.
1) Surveillance becomes much more difficult. (if you’re looking in one valley and the rebels are in another, you won’t find them)
2) Heavy arms become much more burdensome. (mountainous terrain is less accessible to tanks than men in many places)
If you look at successful modern insurgencies (Afghanistan — USSR and USA/Pakistan /Chinese Communists/ Nepalese Maoists/ FARC/ fmr Yugoslavia/ Chechnya/ Nicaragua), you’ll note that mountainous terrain may not be necessary for a successful rebellion, but it seems to help.
Slicing the sample based on regions of the world is also not unusual in international relations research (or political science more generally). Regions have their own unique shared experiences, cultures and belief structures that can be difficult to capture using a quantitative indicator. For instance, you can’t properly understand the influence of religion on politics in the United States unless you disaggregate the South. In addition, the characteristics of inter-state relations often differ based on the degree of institutionalization in each region. You can’t understand Europe without the EU. Even if you used EU membership as a proxy variable, the effects of institutionalization inevitably spills over into non-EU members like Switzerland or Norway.
I’m not saying there aren’t more methodologically sophisticated ways of doing this type of research, but it’s a bit over-the-top to criticize him for a theoretically defensible use of region as a variable or sub-sample.
I have a problem, I like what both Easterly and Collier have to say. It’s not what they say, it’s that they call into question conventional wisdom. It seems as if both are driven by basic humanistic principles of social justice and equity.
The problem is that they can’t correlate moral positions to political organizations or prescribe social policies that will lead to the preferred goals. This is also a problem for many others who use economics as a support for social policy recommendations.
Perhaps it is time to try a different approach, none of the other attempts have been particularly successful so far.
I suggest basing policy on morality and fairness, or even happiness. If a policy improves the lot of many people and does little harm to the “haves” then it is worth carrying out regardless of how it measures on some arbitrary economic scale. For example it is clear the providing health care in the US to all is a “good thing” from a moral point of view. Once this has been set as a goal then economists can be called in (along with other social scientists) to offer advice on the costs of various approaches.
The same is true with international development. Raising the poor to a decent level of income is the moral objective, but this is not universally agreed on. Many who oppose it are disingenuous and support the status quo of excessive wealth maldistribution, but instead of revealing their self interest they make claims about dependency of the poor on aid or debasement of entrepreneurship or other such claims.
Economics is not a morality free endeavor no matter how much some claim. Focus on morality and ethics first.
* By “successful”, I mean much more successful than one might expect given the clear on-paper advantages of the opposing forces.
It’s really helpful (and also entertaining) that Easterly is onto Collier and his fudges. And he’s not the only researcher to have done so. Among the best demolition jobs was done by Laurie Nathan of the University of Cape Town, whose 2005 paper is entitled: ‘‘The Frightful Inadequacy of Most of the Statistics’. A Critique of Collier and Hoeffler on Causes of Civil War’
Nathan dismantles Collier’s work with great finesse, citing “…unsubstantiated explanations of results; incomplete, inaccurate and biased data; and theoretical and analytical flaws that preclude an adequate understanding of the causes of civil war. The greatest problem is that [Collier & co-author] seek to ascertain the causes of civil war without studying civil wars, and attempt to determine the motives of rebels without studying rebels and rebellions. Their most prominent finding – that dependence on natural resources heightens a country’s risk of war because it affords rebels an opportunity for extortion – is not based on any evidence of rebel behaviour; it is an inference drawn from a correlation between the onset of civil war and the ratio of primary commodity exports to GDP. To borrow a felicitous phrase from Keynes, the C&H model suffers from a “frightful inadequacy of most of the statistics”
I would be interested in your thoughts on the analysis of the Gates Foundation that just came out.
According to an article in the LA times http://tinyurl.com/ot9ppw
“The director of a London-based think tank called the study an interesting paper on a pertinent topic, but said it is ruined by the ideological assumptions it manages to smuggle in.”
The problem is that they can’t correlate moral positions to political organizations…
What would it mean to correlate a moral position to a political organisation?
… or prescribe social policies that will lead to the preferred goals. This is also a problem for many others who use economics as a support for social policy recommendations.
Perhaps it is time to try a different approach, none of the other attempts have been particularly successful so far.
I suggest basing policy on morality and fairness, or even happiness. If a policy improves the lot of many people and does little harm to the “haves” then it is worth carrying out regardless of how it measures on some arbitrary economic scale.
This is really really odd. Your suggested policy criteria of “if a policy improves the lot of many people and does little harm to the “haves”" sounds to me exactly like standard cost-benefit analysis. But in the line above you climaed that “none of the other attempts have been particularly successful so far.” If standard cost-benefit analysis has not been particularly succcessful so far, why do you suggest using it again? What’s special about the future that makes you think it will work now, when it hasn’t, at least according to you, in the past.
And yes, Easterly and Collier do have the problem that they don’t know what social policies will lead to the preferred goals. This is a problem of ignorance. I don’t see how standard cost-benefit analysis as you describe deals with this problem of ignorance. After all, it’s all very easy to say that we should adopt a policy when the benefits outweigh the costs, the problem is in measuring the benefits and the costs. See this critique by Friends of Earth for a starting point for the problems with cost-benefit analysis such as you advocate.
Economics is not a morality free endeavor no matter how much some claim. Focus on morality and ethics first.
Nope. Focus on understanding the world first. Morality and ethics are useless, or worse, if you don’t know what you’re applying them to. For example, what is moral about advocating continuing procedures that have failed again and again in the past, just becuase they sound attractive at first glance from a moral viewpoint? I know of nothing in morality and ethics that requires us to be ignorant or stupid.
Tracy W:
I’m not an ethicist which is why I keep hoping a few will enter debates on economic justifications for public policy, but…
Cost/benefit is where Ford decides to save $10 per Pinto on a reinforced gas tank because the amount they may have to pay out later won’t equal the amount they save. It is based upon probability and is totally amoral (or immoral, if you prefer).
Ethics is where society decides to forgo additional taxes on those that can afford it instead of using the money so generated to cover the health needs of the uninsured. There is no probability involved here, we know how to correct an immediate social wrong, but the powerful prevent it.
Perhaps I can’t generalize this into a consistent rule of conduct, but it is clear that we are talking about different categories of costs and benefits in the two cases.
robertdfeinman: – your health care example sounds like standard cost-benefit analysis to me. You decide that the social benefits of universal healthcare are greater than the social costs of the taxes, so we should do it. Okay, you haven’t showed your workings, but this is merely a blog comment so not the place and presumably you could do so if necessary. What interests me though is that this sounds exactly like standard old conventional cost-benefit analysis. The Ford case could easily have been accounted for in the old traditional cost-benefit analysis line by upping the value they put on human lives, so no, I don’t see the difference in costs that you are talking about.
What I still don’t get is how you can simultaneously state that previous attempts at policy criteria have not been particularly successful, and continue advocating such an old technique as cost-benefit analysis. If it’s failed in the past, why do you think it will suddenly start to work again in the future? You haven’t answered that question. (Nor have you answered mine about what it means to correlate a policy position to an organisation).
Perhaps I can’t generalize this into a consistent rule of conduct…
So then why are you advocating it as a policy approach? I would have thought that a fundamental requirement for a policy approach would be that it could be applied with some consistency.
Sorry that last post was me.