Spatial variations in the distribution and composition of populations inform urban development, health-risk analyses, disaster relief, and more. Despite the broad relevance and importance of such data, acquiring local census estimates in a timely and accurate manner is challenging because population counts can change rapidly, are often politically charged, and suffer from logistical and administrative challenges. These limitations necessitate the development of alternative or complementary approaches to population mapping. In this paper we develop an explicit connection between telecommunications data and the underlying population distribution of Milan, Italy. We go on to test the scale invariance of this connection and use telecommunications data in conjunction with high-resolution census data to create easily updated and potentially real time population estimates in time and space.
Subnational conflict research increasingly utilizes georeferenced event datasets to understand contentious politics and violence. Yet, how exactly locations are mapped to particular geographies, especially from unstructured text sources such as newspaper reports and archival records, remains opaque and few best practices exist for guiding researchers through the subtle but consequential decisions made during geolocation. We begin to address this gap by developing a systematic approach to georeferencing that articulates the strategies available, empirically diagnoses problems of bias created by both the data generating process and researcher-controlled tasks, and provides new generalizable tools for simultaneously optimizing both the recovery and accuracy of coordinates. We then empirically evaluate our process and tools against new micro-level data on the Mau Mau rebellion (colonial Kenya 1952–60), drawn from 20,000 pages of recently declassified British military intelligence reports. By leveraging a subset of these data that includes map codes alongside natural language location descriptions, we demonstrate how inappropriately georeferencing data can have important downstream consequences in terms of systematically biasing coefficients or altering statistical significance and how our tools can help alleviate these problems.
Military intelligence is underutilized in the study of civil war violence. Declassified records are hard to acquire and difficult to explore with the standard econometrics toolbox. I investigate a contemporary government database of civilians targeted during the Vietnam War. The data are detailed, with up to 45 attributes recorded for 73,712 individual civilian suspects. I employ an unsupervised machine learning approach of cleaning, variable selection, dimensionality reduction, and clustering. I find support for a simplifying typology of civilian targeting that distinguishes different kinds of suspects and different kinds targeting methods. The typology is robust, successfully clustering both government actors and rebel departments into groups that mirror their known functions. The exercise highlights methods for dealing with high dimensional found conflict data. It also illustrates how aggregating measures of political violence masks a complex underlying empirical data generating process as well as a complex institutional reporting process.
One of the main ways we try to understand the COVID-19 pandemic is through time series cross section counts of cases and deaths. Observational studies based on these kinds of data have concrete and well known methodological issues that suggest significant caution for both consumers and produces of COVID-19 knowledge. We briefly enumerate some of these issues in the areas of measurement, inference, and interpretation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.