Crises in financial markets affect humans worldwide. Detailed market data on trading decisions reflect some of the complex human behavior that has led to these crises. We suggest that massive new data sources resulting from human interaction with the Internet may offer a new perspective on the behavior of market participants in periods of large market movements. By analyzing changes in Google query volumes for search terms related to finance, we find patterns that may be interpreted as “early warning signs” of stock market moves. Our results illustrate the potential that combining extensive behavioral data sets offers for a better understanding of collective human behavior.
Abstract. The increasing integration of technology into our lives has created unprecedented volumes of data on society's everyday behaviour. Such data opens up exciting new opportunities to work towards a quantitative understanding of our complex social systems, within the realms of a new discipline known as Computational Social Science. Against a background of financial crises, riots and international epidemics, the urgent need for a greater comprehension of the complexity of our interconnected global society and an ability to apply such insights in policy decisions is clear. This manifesto outlines the objectives of this new scientific direction, considering the challenges involved in it, and the extensive impact on science, technology and society that the success of this endeavour is likely to bring about.
Financial crises result from a catastrophic combination of actions. Vast stock market datasets offer us a window into some of the actions that have led to these crises. Here, we investigate whether data generated through Internet usage contain traces of attempts to gather information before trading decisions were taken. We present evidence in line with the intriguing suggestion that data on changes in how often financially related Wikipedia pages were viewed may have contained early signs of stock market moves. Our results suggest that online data may allow us to gain new insight into early information gathering stages of decision making.
We introduce a future orientation index to quantify the degree to which Internet users worldwide seek more information about years in the future than years in the past. We analyse Google logs and find a striking correlation between the country's GDP and the predisposition of its inhabitants to look forward.
Vast numbers of scientific articles are published each year, some of which attract considerable attention, and some of which go almost unnoticed. Here, we investigate whether any of this variance can be explained by a simple metric of one aspect of the paper's presentation: the length of its title. Our analysis provides evidence that journals which publish papers with shorter titles receive more citations per paper. These results are consistent with the intriguing hypothesis that papers with shorter titles may be easier to understand, and hence attract more citations.
Significance
Internet search data may offer new possibilities to improve forecasts of collective behavior, if we can identify which parts of these gigantic search datasets are relevant. We introduce an automated method that uses data from Google and Wikipedia to identify relevant topics in search data before large events. Using stock market moves as a case study, our method successfully identifies historical links between searches related to business and politics and subsequent stock market moves. We find that the predictive value of these search terms has recently diminished, potentially reflecting increasing incorporation of Internet data into automated trading strategies. We suggest that extensions of these analyses could help draw links between search data and a range of other collective actions.
Being able to infer the number of people in a specific area is of extreme importance for the avoidance of crowd disasters and to facilitate emergency evacuations. Here, using a football stadium and an airport as case studies, we present evidence of a strong relationship between the number of people in restricted areas and activity recorded by mobile phone providers and the online service Twitter. Our findings suggest that data generated through our interactions with mobile phone networks and the Internet may allow us to gain valuable measurements of the current state of society.
Seasonal influenza outbreaks and pandemics of new strains of the influenza virus affect humans around the globe. However, traditional systems for measuring the spread of flu infections deliver results with one or two weeks delay. Recent research suggests that data on queries made to the search engine Google can be used to address this problem, providing real-time estimates of levels of influenza-like illness in a population. Others have however argued that equally good estimates of current flu levels can be forecast using historic flu measurements. Here, we build dynamic ‘nowcasting’ models; in other words, forecasting models that estimate current levels of influenza, before the release of official data one week later. We find that when using Google Flu Trends data in combination with historic flu levels, the mean absolute error (MAE) of in-sample ‘nowcasts’ can be significantly reduced by 14.4%, compared with a baseline model that uses historic data on flu levels only. We further demonstrate that the MAE of out-of-sample nowcasts can also be significantly reduced by between 16.0% and 52.7%, depending on the length of the sliding training interval. We conclude that, using adaptive models, Google Flu Trends data can indeed be used to improve real-time influenza monitoring, even when official reports of flu infections are available with only one week's delay.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.