We aim to produce predictive models that are not only accurate, but are also interpretable to human experts. Our models are decision lists, which consist of a series of if . . . then. . . statements (e.g., if high blood pressure, then stroke) that discretize a high-dimensional, multivariate feature space into a series of simple, readily interpretable decision statements. We introduce a generative model called Bayesian Rule Lists that yields a posterior distribution over possible decision lists. It employs a novel prior structure to encourage sparsity. Our experiments show that Bayesian Rule Lists has predictive accuracy on par with the current top algorithms for prediction in machine learning. Our method is motivated by recent developments in personalized medicine, and can be used to produce highly accurate and interpretable medical scoring systems. We demonstrate this by producing an alternative to the CHADS2 score, actively used in clinical practice for estimating the risk of stroke in patients that have atrial fibrillation. Our model is as interpretable as CHADS2, but more accurate. This is an electronic reprint of the original article published by the Institute of Mathematical Statistics in The Annals of Applied Statistics, 2015, Vol. 9, No. 3, 1350-1371. This reprint differs from the original in pagination and typographic detail. 1 2 LETHAM, RUDIN, MCCORMICK AND MADIGAN if male and adult then survival probability 21% (19%-23%) else if 3rd class then survival probability 44% (38%-51%) else if 1st class then survival probability 96% (92%-99%) else survival probability 88% (82%-94%)
In this paper we develop a method to estimate both individual social network size (i.e., degree) and the distribution of network sizes in a population by asking respondents how many people they know in specific subpopulations (e.g., people named Michael). Building on the scale-up method of Killworth et al. (1998b) and other previous attempts to estimate individual network size, we propose a latent non-random mixing model which resolves three known problems with previous approaches. As a byproduct, our method also provides estimates of the rate of social mixing between population groups. We demonstrate the model using a sample of 1,370 adults originally collected by McCarty et al. (2001). Based on insights developed during the statistical modeling, we conclude by offering practical guidelines for the design of future surveys to estimate social network size. Most importantly, we show that if the first names to be asked about are chosen properly, the simple scale-up degree estimates can enjoy the same bias-reduction as that from the our more complex latent non-random mixing model.
the University of Wisconsin, Madison. We thank Richard Breen and the other participants at these seminars, Nan Lin, Howard Aldrich, and also three anonymous AJS reviewers for their helpful comments and suggestions. Please direct correspondence to Thomas A. DiPrete (tad61@columbia.edu), 601B Knox Hall, Columbia University, 606 W. 122nd St., New York, NY 10027. AbstractUsing data from the 2006 General Social Survey, we compare levels of segregation by race and along other dimensions of potential social cleavage in the contemporary United States.Americans are not as isolated as the most extreme recent estimates suggest. However, hopes that "bridging" social capital is more common in broader acquaintanceship networks than in core networks are not supported by the GSS data. Instead, the entire acquaintanceship network is perceived by Americans to be about as segregated as the much smaller network of close ties. People do not always know the religiosity, political ideology, family behaviors, or socioeconomic status of their acquaintances, but perceived social divisions on these dimensions are high and in some cases rival the extent of racial segregation in acquaintanceship networks.The major challenge to social integration today comes less from the risk of social isolation than from the tendency of many Americans to isolate themselves from others who differ on race, political ideology, level of religiosity, and other salient aspects of social identity.
In regions without complete-coverage civil registration and vital statistics systems there is uncertainty about even the most basic demographic indicators. In such regions the majority of deaths occur outside hospitals and are not recorded. Worldwide, fewer than one-third of deaths are assigned a cause, with the least information available from the most impoverished nations. In populations like this, verbal autopsy (VA) is a commonly used tool to assess cause of death and estimate cause-specific mortality rates and the distribution of deaths by cause. VA uses an interview with caregivers of the decedent to elicit data describing the signs and symptoms leading up to the death. This paper develops a new statistical tool known as InSilicoVA to classify cause of death using information acquired through VA. InSilicoVA shares uncertainty between cause of death assignments for specific individuals and the distribution of deaths by cause across the population. Using side-by-side comparisons with both observed and simulated data, we demonstrate that InSilicoVA has distinct advantages compared to currently available methods.
Despite recent and growing interest in using Twitter to examine human behavior and attitudes, there is still significant room for growth regarding the ability to leverage Twitter data for social science research. In particular, gleaning demographic information about Twitter users—a key component of much social science research—remains a challenge. This article develops an accurate and reliable data processing approach for social science researchers interested in using Twitter data to examine behaviors and attitudes, as well as the demographic characteristics of the populations expressing or engaging in them. Using information gathered from Twitter users who state an intention to not vote in the 2012 presidential election, we describe and evaluate a method for processing data to retrieve demographic information reported by users that is not encoded as text (e.g., details of images) and evaluate the reliability of these techniques. We end by assessing the challenges of this data collection strategy and discussing how large-scale social media data may benefit demographic researchers.
Findings suggest that profiles which self-identify as Pro-ED express disordered eating patterns through tweets and have an audience of followers, many of whom also reference ED in their own profiles. ED socialization on Twitter might provide social support, but in the Pro-ED context this activity might also reinforce an ED identity.
In just the last forty years, imprisonment has been transformed from an event experienced by only the most marginalized to a common stage in the life course of American men—especially Black men with low levels of educational attainment. Although much research considers the causes of the prison boom and how the massive uptick in imprisonment has shaped crime rates and the life course of the men who experience imprisonment, in recent years, researchers have gained a keen interest in the spillover effects of mass imprisonment on families, children, and neighborhoods. Unfortunately, although this new wave of research documents the generally harmful effects of having a family member or loved one incarcerated, it remains unclear how much the prison boom shapes social inequality through these spillover effects because we lack precise estimates of the racial inequality in connectedness—through friends, family, and neighbors—to prisoners. Using the 2006 General Social Survey, we fill this pressing research gap by providing national estimates of connectedness to prisoners—defined in this article as knowing someone who is currently imprisoned, having a family member who is currently imprisoned, having someone you trust who is currently imprisoned, or having someone you know from your neighborhood who is currently imprisoned—for Black and White men and women. Most provocatively, we show that 44% of Black women (and 32% of Black men) but only 12% of White women (and 6% of White men) have a family member imprisoned. This means that about one in four women in the United States currently has a family member in prison. Given these high rates of connectedness to prisoners and the vast racial inequality in them, it is likely that mass imprisonment has fundamentally reshaped inequality not only for the adult men for whom imprisonment has become common, but also for their friends and families.
The digital traces that we leave online are increasingly fruitful sources of data for social scientists, including those interested in demographic research. The collection and use of digital data also presents numerous statistical, computational, and ethical challenges, motivating the development of new research approaches to address these burgeoning issues. In this article, we argue that researchers with formal training in demography—those who have a history of developing innovative approaches to using challenging data—are well positioned to contribute to this area of work. We discuss the benefits and challenges of using digital trace data for social and demographic research, and we review examples of current demographic literature that creatively use digital trace data to study processes related to fertility, mortality, and migration. Focusing on Facebook data for advertisers—a novel “digital census” that has largely been untapped by demographers—we provide illustrative and empirical examples of how demographic researchers can manage issues such as bias and representation when using digital trace data. We conclude by offering our perspective on the road ahead regarding demography and its role in the data revolution.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.