2019
DOI: 10.3386/w25825
|View full text |Cite
|
Sign up to set email alerts
|

Automated Linking of Historical Data

Abstract: The recent digitization of complete count census data is an extraordinary opportunity for social scientists to create large longitudinal datasets by linking individuals from one census to another or from other sources to the census. We evaluate different automated methods for record linkage, performing a series of comparisons across methods and against hand linking. We have three main findings that lead us to conclude that automated methods perform well. First, a number of automated methods generate very low (… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

4
126
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
3
1

Relationship

2
7

Authors

Journals

citations
Cited by 82 publications
(133 citation statements)
references
References 33 publications
(11 reference statements)
4
126
0
Order By: Relevance
“…Our match rates of around 20 percent from the 1860 Census to either 1870 or 1900 is standard for Census-based linking in the nineteenth century, due to factors like the widespread use of first initials, rather than complete names, on Census manuscripts, and the old-fashioned handwriting that can lead to transcription errors in the digitization process (see, for example, Bleakley and Ferrie, 2016;Salisbury, 2017;Eli, Salisbury and Shertzer, 2018). Abramitzky, et al (2019) document that, even in more recent Census files (=1940), the maximum match rate is around 50 percent, particularly due to the prevalence of common names that cannot be distinguished within year of birth/ state of birth cells.…”
Section: A Census Linkingmentioning
confidence: 99%
See 1 more Smart Citation
“…Our match rates of around 20 percent from the 1860 Census to either 1870 or 1900 is standard for Census-based linking in the nineteenth century, due to factors like the widespread use of first initials, rather than complete names, on Census manuscripts, and the old-fashioned handwriting that can lead to transcription errors in the digitization process (see, for example, Bleakley and Ferrie, 2016;Salisbury, 2017;Eli, Salisbury and Shertzer, 2018). Abramitzky, et al (2019) document that, even in more recent Census files (=1940), the maximum match rate is around 50 percent, particularly due to the prevalence of common names that cannot be distinguished within year of birth/ state of birth cells.…”
Section: A Census Linkingmentioning
confidence: 99%
“…We present results using a more conservative matching strategy that requires individuals to be unique by name and state of birth within a five-year age band. This conservative procedure is roughly as successful at reducing the "false positive" rate as are a series of more computationally-intensive matching approaches (Bailey et al, 2017;Abramitzky et al, 2019).…”
Section: A Census Linkingmentioning
confidence: 99%
“…However, we only retain in our linked sample individuals between the ages of 3 and 18, inclusive, in childhood census years.8 Our results are not sensitive to this restriction. 9 SeeAbramitzky et al (2018) for a description of the small differences between the matching method used in this paper and the original method described byAbramitzky, Boustan, and Eriksson (2012). Specifically, we use the abematch command provided at: https://ranabr.people.stanford.edu/matching-codes.…”
mentioning
confidence: 99%
“…Not limited to arrest, this issue is applicable to any context in which the goal of the linking process is to determine the presence of an outcome and there is no prior prediction for how many records should match, or that changes relative to the prior prediction are the quantity of interest. These points distinguish this linking case from other kinds of longitudinal record linking cases (Abramitzky et al, 2019;Bailey et al, 2017;Feigenbaum, 2016) but, to our knowledge, this distinction has not been discussed in the prior literature on administrative data linking.…”
Section: Current Setupmentioning
confidence: 80%