Some of the most important phenomena in international conflict are coded as “rare events”: binary dependent variables with dozens to thousands of times fewer events, such as wars and coups, than “nonevents.” Unfortunately, rare events data are difficult to explain and predict, a problem stemming from at least two sources. First, and most important, the data-collection strategies used in international conflict studies are grossly inefficient. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (wars, for example) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99 percent of their (nonfixed) data-collection costs or to collect much more meaningful explanatory variables. Second, logistic regression, and other commonly used statistical procedures, can underestimate the probability of rare events. We introduce some corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. We also provide easy-to-use methods and software that link these two results, enabling both types of corrections to work simultaneously.
Gary King is Professor of Government at Harvard University, Cambridge, Massachusetts. He can be reached at [email protected].
Langche Zeng is Associate Professor in the Department of Political Science at George Washington University, Washington, D.C. She can be reached at [email protected].