Abstract:How predictable is success in complex social systems? In spite of a recent profusion of prediction studies that exploit online social and information network data, this question remains unanswered, in part because it has not been adequately specified. In this paper we attempt to clarify the question by presenting a simple stylized model of success that attributes prediction error to one of two generic sources: insufficiency of available data and/or models on the one hand; and inherent unpredictability of compl… Show more
“…The reason for this incoherence is that prediction results depend on many of the same "researcher degrees of freedom" that lead to false positives in traditional hypothesis testing (3). For example, consider the question of predicting the size of online diffusion "cascades" to understand how information spreads through social networks, a topic of considerable recent interest (6,7,10,11). Although seemingly unambiguous, this question can be answered only after it has first been translated into a specific computational task, which in turn requires the researcher to make a series of subjective choices, including the selection of the task, data set, model, and performance metric.…”
Section: Standards For Predictionmentioning
confidence: 99%
“…In addition to holding the data set fixed, for simplicity, we also restricted our analysis to a single choice of model, reported in (11), that predicts cascade size as a linear function of the average past performance of the "seed" individual (i.e., the one who initiated the cascade). Even with the data source and model held fixed, Fig.…”
Section: Standards For Predictionmentioning
confidence: 99%
“…In skill world, for example, if one could hypothetically measure skill, then in principle it would be Hofman http://science.sciencemag.org/ possible to predict success with almost perfect precision. In luck world, in contrast, even a "perfect" predictor would yield mediocre performance, no better than predicting that all items will experience the same (i.e., average) level of success (11). It follows, therefore, that the more that outcomes are determined by extrinsic random factors, the lower the theoretical best performance that can be attained by any model.…”
Historically, social scientists have sought out explanations of human and social phenomena that provide interpretable causal mechanisms, while often ignoring their predictive accuracy. We argue that the increasingly computational nature of social science is beginning to reverse this traditional bias against prediction; however, it has also highlighted three important issues that require resolution. First, current practices for evaluating predictions must be better standardized. Second, theoretical limits to predictive accuracy in complex social systems must be better characterized, thereby setting expectations for what can be predicted or explained. Third, predictive accuracy and interpretability must be recognized as complements, not substitutes, when evaluating explanations. Resolving these three issues will lead to better, more replicable, and more useful social science.
“…The reason for this incoherence is that prediction results depend on many of the same "researcher degrees of freedom" that lead to false positives in traditional hypothesis testing (3). For example, consider the question of predicting the size of online diffusion "cascades" to understand how information spreads through social networks, a topic of considerable recent interest (6,7,10,11). Although seemingly unambiguous, this question can be answered only after it has first been translated into a specific computational task, which in turn requires the researcher to make a series of subjective choices, including the selection of the task, data set, model, and performance metric.…”
Section: Standards For Predictionmentioning
confidence: 99%
“…In addition to holding the data set fixed, for simplicity, we also restricted our analysis to a single choice of model, reported in (11), that predicts cascade size as a linear function of the average past performance of the "seed" individual (i.e., the one who initiated the cascade). Even with the data source and model held fixed, Fig.…”
Section: Standards For Predictionmentioning
confidence: 99%
“…In skill world, for example, if one could hypothetically measure skill, then in principle it would be Hofman http://science.sciencemag.org/ possible to predict success with almost perfect precision. In luck world, in contrast, even a "perfect" predictor would yield mediocre performance, no better than predicting that all items will experience the same (i.e., average) level of success (11). It follows, therefore, that the more that outcomes are determined by extrinsic random factors, the lower the theoretical best performance that can be attained by any model.…”
Historically, social scientists have sought out explanations of human and social phenomena that provide interpretable causal mechanisms, while often ignoring their predictive accuracy. We argue that the increasingly computational nature of social science is beginning to reverse this traditional bias against prediction; however, it has also highlighted three important issues that require resolution. First, current practices for evaluating predictions must be better standardized. Second, theoretical limits to predictive accuracy in complex social systems must be better characterized, thereby setting expectations for what can be predicted or explained. Third, predictive accuracy and interpretability must be recognized as complements, not substitutes, when evaluating explanations. Resolving these three issues will lead to better, more replicable, and more useful social science.
“…Indeed, Martin et al [15] showed that realistic bounds on predicting outcomes in social systems imposes drastic limits on what the best performing models can deliver. And yet, accurate prediction is a holy grail avidly sought in nancial markets [8], sports [7], arts and entertainment award events [13], and politics [23].…”
Section: Introductionmentioning
confidence: 99%
“…According to Blastland and Dilnot [1], the teams favored by bettors win half the time in soccer, 60% of the time in baseball, and 70% of the time in football and in basketball. Despite the large amount of money involved, there are no algorithms capable of producing accurate predictions and there is some evidence they will never be found [15].…”
Predicting the outcome of sports events is a hard task. We quantify this di culty with a coe cient that measures the distance between the observed nal results of sports leagues and idealized perfectly balanced competitions in terms of skill. This indicates the relative presence of luck and skill. We collected and analyzed all games from 198 sports leagues comprising 1503 seasons from 84 countries of 4 di erent sports: basketball, soccer, volleyball and handball. We measured the competitiveness by countries and sports. We also identify in each season which teams, if removed from its league, result in a completely random tournament. Surprisingly, not many of them are needed. As another contribution of this paper, we propose a probabilistic graphical model to learn about the teams' skills and to decompose the relative weights of luck and skill in each game. We break down the skill component into factors associated with the teams' characteristics. The model also allows to estimate as 0.36 the probability that an underdog team wins in the NBA league, with a home advantage adding 0.09 to this probability. As shown in the rst part of the paper, luck is substantially present even in the most competitive championships, which partially explains why sophisticated and complex feature-based models hardly beat simple models in the task of forecasting sports' outcomes.
CCS CONCEPTS•Computing methodologies → Model development and analysis; Uncertainty quanti cation;
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.