How predictable are life trajectories? We investigated this question with a scientific mass collaboration using the common task method; 160 teams built predictive models for six life outcomes using data from the Fragile Families and Child Wellbeing Study, a high-quality birth cohort study. Despite using a rich dataset and applying machine-learning methods optimized for prediction, the best predictions were not very accurate and were only slightly better than those from a simple benchmark model. Within each outcome, prediction error was strongly associated with the family being predicted and weakly associated with the technique used to generate the prediction. Overall, these results suggest practical limits to the predictability of life outcomes in some settings and illustrate the value of mass collaborations in the social sciences.
We make only one point in this article. Every quantitative study must be able to answer the question: what is your estimand? The estimand is the target quantity—the purpose of the statistical analysis. Much attention is already placed on how to do estimation; a similar degree of care should be given to defining the thing we are estimating. We advocate that authors state the central quantity of each analysis—the theoretical estimand—in precise terms that exist outside of any statistical model. In our framework, researchers do three things: (1) set a theoretical estimand, clearly connecting this quantity to theory; (2) link to an empirical estimand, which is informative about the theoretical estimand under some identification assumptions; and (3) learn from data. Adding precise estimands to research practice expands the space of theoretical questions, clarifies how evidence can speak to those questions, and unlocks new tools for estimation. By grounding all three steps in a precise statement of the target quantity, our framework connects statistical evidence to theory.
Recent research shows that men’s wages rise more rapidly than expected prior to marriage, but diverges on whether this indicates selection or a causal effect of anticipating marriage. We seek to adjudicate this debate by bringing together literatures on: (1) the male marriage wage premium, (2) selection into marriage based on men’s economic circumstances, and (3) the transition to adulthood, during which both union formation and unusually rapid improvements in work outcomes often occur. Using data from the NLSY79, we evaluate these perspectives. We show that wage declines predate rather than follow divorce, indicating no evidence that staying married benefits men’s wages. We find that older grooms experience no unusual wage patterns at marriage, suggesting that the observed marriage premium may simply reflect co-occurrence with the transition to adulthood for younger grooms. We show that men entering shotgun marriages experience similar premarital wage gains as other grooms, casting doubt on the claim that anticipation of marriage drives wage increases. We conclude that the observed wage patterns are most consistent with men marrying when their wages are already rising more rapidly than expected and divorcing when their wages are already falling, with no additional causal effect of marriage on wages.
Stewards of social data face a fundamental tension. On one hand, they want to make their data accessible to as many researchers as possible to facilitate new discoveries. At the same time, they want to restrict access to their data as much as possible to protect the people represented in the data. In this article, we provide a case study addressing this common tension in an uncommon setting: the Fragile Families Challenge, a scientific mass collaboration designed to yield insights that could improve the lives of disadvantaged children in the United States. We describe our process of threat modeling, threat mitigation, and third-party guidance. We also describe the ethical principles that formed the basis of our process. We are open about out process and the trade-offs we made in the hope that others can improve on what we have done.
The Fragile Families Challenge is a scientific mass collaboration designed to measure and understand the predictability of life trajectories. Participants in the Challenge created predictive models of six life outcomes using data from the Fragile Families and Child Wellbeing Study, a high-quality birth cohort study. This Special Collection includes 12 articles describing participants’ approaches to predicting these six outcomes as well as 3 articles describing methodological and procedural insights from running the Challenge. This introduction will help readers interpret the individual articles and help researchers interested in running future projects similar to the Fragile Families Challenge.
A growing body of research suggests that housing eviction is more common than previously recognized and may play an important role in the reproduction of poverty. The proportion of children affected by housing eviction, however, remains largely unknown. We estimate that one in seven children born in large U.S. cities in 1998–2000 experienced at least one eviction for nonpayment of rent or mortgage between birth and age 15. Rates of eviction were substantial across all cities and demographic groups studied, but children from disadvantaged backgrounds were most likely to experience eviction. Among those born into deep poverty, we estimate that approximately one in four were evicted by age 15. Given prior evidence that forced moves have negative consequences for children, we conclude that the high prevalence and social stratification of housing eviction are sufficient to play an important role in the reproduction of poverty and warrant greater policy attention.
Disparities across race, gender, and class are important targets of descriptive research. But rather than only describe disparities, research would ideally inform interventions to close those gaps. The gap-closing estimand quantifies how much a gap (e.g., incomes by race) would close if we intervened to equalize a treatment (e.g., access to college). Drawing on causal decomposition analyses, this type of research question yields several benefits. First, gap-closing estimands place categories like race in a causal framework without making them play the role of the treatment (which is philosophically fraught for non-manipulable variables). Second, gap-closing estimands empower researchers to study disparities using new statistical and machine learning estimators designed for causal effects. Third, gap-closing estimands can directly inform policy: if we sampled from the population and actually changed treatment assignments, how much could we close gaps in outcomes? I provide open-source software (the R package gapclosing) to support these methods.
The link between theory and quantitative empirical evidence is a longstanding hurdle in sociological research. Ambiguity about the role that statistical evidence plays in an argument may produce misleading conclusions and poor methodological practice. This ambiguity could be reduced if researchers would state the theoretical estimand---the central quantity at the core of a given paper---in precise language. Our approach envisions three choices in the research process: (1) choice of a theoretical estimand, which will be informative for theory, (2) choice of an empirical estimand, which is informative about the theoretical estimand under some identification assumptions, and (3) choice of an estimation strategy to learn the empirical estimand from data. Key advantages of this approach include improved clarity on the object of interest, transparency about how empirical evidence contributes to knowledge of that quantity, and the ability to easily plug in new statistical tools for estimation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.