We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros ("nonevents"). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quarter-million dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99% of their (nonfixed) data collection costs or to collect much more meaningful explanatory variables. We provide methods that link these two results, enabling both types of corrections to work simultaneously, and software that implements the methods developed.
Many of the most signi cant events in international relations-wars, coups, revolutions, massive economic depressions, economic shocks-are rare events. They occur infrequently but are considered of great importance. In international relations, as in other disciplines, rare events-that is, binary dependent variables characterized by dozens to thousands of times fewer 1's (events such as wars or coups) than 0's (nonevents)-have proven dif cult to explain and predict. Though scholars have made substantial efforts to quantify rare events, they have devoted less attention to how these events are analyzed. We show that problems in explaining and predicting rare events stem primarily from two sources: popular statistical procedures that underestimate the probability of rare events and inef cient data-collection strategies. We analyze the issues involved, cite examples from the international relations literature, and offer some solutions.The rst source of problems in rare-event analysis is researchers' reliance on logit coef cients, which are biased in small samples (those with fewer than two hundred observations), as the statistical literature well documents. Not as widely understood is that the biases in probabilities can be substantively meaningful when sample sizes are in the thousands and are always in the same direction: estimated event probabilities are always too small. A separate, often overlooked problem is that the almost universally used method of computing probabilities of events in logit analysis is suboptimal in nite samples of rare-events data, leading to errors in the We thank
We address the problem that occurs when inferences about counterfactuals-predictions, ''what-if'' questions, and causal effects-are attempted far from the available data. The danger of these extreme counterfactuals is that substantive conclusions drawn from statistical models that fit the data well turn out to be based largely on speculation hidden in convenient modeling assumptions that few would be willing to defend. Yet existing statistical strategies provide few reliable means of identifying extreme counterfactuals. We offer a proof that inferences farther from the data allow more model dependence and then develop easyto-apply methods to evaluate how model dependent our answers would be to specified counterfactuals. These methods require neither sensitivity testing over specified classes of models nor evaluating any specific modeling assumptions. If an analysis fails the simple tests we offer, then we know that substantive results are sensitive to at least some modeling choices that are not based on empirical evidence. Free software that accompanies this article implements all the methods developed.
Thompson and Tucker's (1997) exchange with Farber and Gowa (1997) and Mansfield and Snyder (1997) on the role of democracy in preventing conflict, or in Oneal and Russett (1997), Barbieri (1996), andBeck, Katz, and
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.