Langche Zeng scite author profile

We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros ("nonevents"). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quarter-million dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99% of their (nonfixed) data collection costs or to collect much more meaningful explanatory variables. We provide methods that link these two results, enabling both types of corrections to work simultaneously, and software that implements the methods developed.

show abstract

Explaining Rare Events in International Relations

King¹,

Zeng²

2001

Int Org

610

354

View full text Add to dashboard Cite

Many of the most signi cant events in international relations-wars, coups, revolutions, massive economic depressions, economic shocks-are rare events. They occur infrequently but are considered of great importance. In international relations, as in other disciplines, rare events-that is, binary dependent variables characterized by dozens to thousands of times fewer 1's (events such as wars or coups) than 0's (nonevents)-have proven dif cult to explain and predict. Though scholars have made substantial efforts to quantify rare events, they have devoted less attention to how these events are analyzed. We show that problems in explaining and predicting rare events stem primarily from two sources: popular statistical procedures that underestimate the probability of rare events and inef cient data-collection strategies. We analyze the issues involved, cite examples from the international relations literature, and offer some solutions.The rst source of problems in rare-event analysis is researchers' reliance on logit coef cients, which are biased in small samples (those with fewer than two hundred observations), as the statistical literature well documents. Not as widely understood is that the biases in probabilities can be substantively meaningful when sample sizes are in the thousands and are always in the same direction: estimated event probabilities are always too small. A separate, often overlooked problem is that the almost universally used method of computing probabilities of events in logit analysis is suboptimal in nite samples of rare-events data, leading to errors in the We thank

show abstract

The Dangers of Extreme Counterfactuals

King¹,

Zeng²

2006

Polit. anal.

432

331

View full text Add to dashboard Cite

We address the problem that occurs when inferences about counterfactuals-predictions, ''what-if'' questions, and causal effects-are attempted far from the available data. The danger of these extreme counterfactuals is that substantive conclusions drawn from statistical models that fit the data well turn out to be based largely on speculation hidden in convenient modeling assumptions that few would be willing to defend. Yet existing statistical strategies provide few reliable means of identifying extreme counterfactuals. We offer a proof that inferences farther from the data allow more model dependence and then develop easyto-apply methods to evaluate how model dependent our answers would be to specified counterfactuals. These methods require neither sensitivity testing over specified classes of models nor evaluating any specific modeling assumptions. If an analysis fails the simple tests we offer, then we know that substantive results are sensitive to at least some modeling choices that are not based on empirical evidence. Free software that accompanies this article implements all the methods developed.

show abstract

Improving Quantitative Studies of International Conflict: A Conjecture

2000

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Langche Zeng

Logistic Regression in Rare Events Data

Logistic Regression in Rare Events Data

Explaining Rare Events in International Relations

The Dangers of Extreme Counterfactuals

Improving Quantitative Studies of International Conflict: A Conjecture

Contact Info

Product

Resources

About