Abstract:Historically, social scientists have sought out explanations of human and social phenomena that provide interpretable causal mechanisms, while often ignoring their predictive accuracy. We argue that the increasingly computational nature of social science is beginning to reverse this traditional bias against prediction; however, it has also highlighted three important issues that require resolution. First, current practices for evaluating predictions must be better standardized. Second, theoretical limits to pr… Show more
“…This implies that, if model A is better than model B given infinite data, due to bias-variance trade-off, there is no guarantee that the former will be better than the latter given finite data or a particular dataset. Therefore, we also need "predictive modeling" (Donoho, 2015;Hofman et al, 2017), which is generally agnostic about a data generating mechanism and allows multiple models to learn from and work on multiple datasets. Some of these are used to train the models, others are put aside as test sets, just as we turn a ball many times and each time we make a prediction about the patterns on the side we do not see using the information on the side we can see.…”
Section: Two Modeling Approachesmentioning
confidence: 99%
“…Some of these are used to train the models, others are put aside as test sets, just as we turn a ball many times and each time we make a prediction about the patterns on the side we do not see using the information on the side we can see. The performances of the trained models are then judged against a common task, usually, predictive accuracy on test sets, which is easy-to-understand and can be compared across datasets and over time (Breiman, 2001;Donoho, 2015;Hofman et al, 2017;James et al, 2015).…”
Data analysis usually aims to identify a particular signal, such as an intervention effect. Conventional analyses often assume a specific data generation process, which suggests a theoretical model that best fits the data. Machine learning techniques do not make such an assumption. In fact, they encourage multiple models to compete on the same data. Applying logistic regression and machine learning algorithms to real and simulated datasets with different features of noise and signal, we demonstrate that no single model dominates others under all circumstances. By showing when different models shine or struggle, we argue it is both possible and important to conduct comparative analyses.
“…This implies that, if model A is better than model B given infinite data, due to bias-variance trade-off, there is no guarantee that the former will be better than the latter given finite data or a particular dataset. Therefore, we also need "predictive modeling" (Donoho, 2015;Hofman et al, 2017), which is generally agnostic about a data generating mechanism and allows multiple models to learn from and work on multiple datasets. Some of these are used to train the models, others are put aside as test sets, just as we turn a ball many times and each time we make a prediction about the patterns on the side we do not see using the information on the side we can see.…”
Section: Two Modeling Approachesmentioning
confidence: 99%
“…Some of these are used to train the models, others are put aside as test sets, just as we turn a ball many times and each time we make a prediction about the patterns on the side we do not see using the information on the side we can see. The performances of the trained models are then judged against a common task, usually, predictive accuracy on test sets, which is easy-to-understand and can be compared across datasets and over time (Breiman, 2001;Donoho, 2015;Hofman et al, 2017;James et al, 2015).…”
Data analysis usually aims to identify a particular signal, such as an intervention effect. Conventional analyses often assume a specific data generation process, which suggests a theoretical model that best fits the data. Machine learning techniques do not make such an assumption. In fact, they encourage multiple models to compete on the same data. Applying logistic regression and machine learning algorithms to real and simulated datasets with different features of noise and signal, we demonstrate that no single model dominates others under all circumstances. By showing when different models shine or struggle, we argue it is both possible and important to conduct comparative analyses.
“…Not resolving these "bewildering complexities" has left researchers in the social (e.g., economic, humanistic, philosophic, networks, game theory) disciplines struggling to predict the outcomes of basic interactions, exemplified by the difficulty in replicating experiments (Nosek 2015); left them aimless (Hofman et al 2017); and left them stunned by the achievements of their colleagues in the hard sciences (e.g., physics, chemistry, biology, engineering). The philosophy or history of science, where the search for truth goes to die, has replaced the foundations inherent in science with endless debate (Nickels 2017).…”
Appearances can be misleading, but not in the social sciences. Based on the statistical aggregation of intuitions (observations, self-reports, interviews) about reality across individuals that converge while seeking a 1:1 relation, the primary model of decision making attempts to make intuitions rational. But despite its many claims to the contrary, the social sciences have failed in building a successful predictive theory, including in economics where the results from this failure, re-labeled as irrational, have won Nobel prizes, yet irrational humans in freely organized and competitive teams strangely manage to be extraordinarily innovative. In contrast to traditional social science, the most predictive theory in all of science is the quantum theory, each prediction confirmed by new discoveries leading to new predictions and further discoveries, but the dualist nature of the quantum theory makes it counterintuitive despite more than a century of intense, unflagging debate. By re-introducing dualism into social science with a quantumlike theory of social interdependence, we offer an opportunity to rehabilitate social science by successfully making predictions and new discoveries about human teams that account for the abysmal performance of interdisciplinary science teams; that generalizes to the newly arising problem of how to engineer hybrid teams (arbitrary combinations of autonomous humans, machines and robots); and that explains the counterintuitive prediction that highly interdependent teams do not generate Shannon information, but instead "darken" as a team becomes perfect, meaning, intuitively, that structural information about a team can be gained only under competition (i.e., perturbation theory).
“…Here, we find a strong prominence of studies concentrating on statistical predictions of various political outcomes based on signals found in digital trace data (see Hofman, Sharma, & Watts, 2017;Schoen et al, 2013). In style and design, these studies follow computer science papers attempting to predict social phenomena or economic outcomes based on digital trace data (e.g., Choi & Varian, 2012).…”
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citationsâcitations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.