2020
DOI: 10.1111/rssa.12630
|View full text |Cite
|
Sign up to set email alerts
|

Linkage-Data Linear Regression

Abstract: Data linkage is increasingly being used to combine data from different sources with the aim of identifying and bringing together records from separate files, which correspond to the same entities. Usually, data linkage is not a trivial procedure and linkage errors, false and missed links, are unavoidable. In these cases, standard statistical techniques may produce misleading inference. In this paper, we propose a method for secondary linear regression analysis, where the linked data have to be prepared by some… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 12 publications
(20 citation statements)
references
References 40 publications
0
15
0
Order By: Relevance
“…Another interesting problem for further research is how best to incorporate “calibrating” population summary information into the robust estimation procedures that we describe in this article. The pseudo‐OLS method of Zhang and Tuoto (2021) is based on the fact that under independence of different population units, the covariance between boldxi and yi* is λ times the covariance between boldxi and y i , where λ is an overall probability of correct linkage. However, it is unlikely that linkage errors will be homogeneous and extension of this approach to heterogeneous linkage errors, particularly those that vary between blocks, and clustered data seems worthwhile.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Another interesting problem for further research is how best to incorporate “calibrating” population summary information into the robust estimation procedures that we describe in this article. The pseudo‐OLS method of Zhang and Tuoto (2021) is based on the fact that under independence of different population units, the covariance between boldxi and yi* is λ times the covariance between boldxi and y i , where λ is an overall probability of correct linkage. However, it is unlikely that linkage errors will be homogeneous and extension of this approach to heterogeneous linkage errors, particularly those that vary between blocks, and clustered data seems worthwhile.…”
Section: Discussionmentioning
confidence: 99%
“…Figure 1 is an illustration of the role of blocks in data linkage, using fictitious individual and income data (data set X) and consumption data (data set Y). This figure is taken from Zhang and Tuoto (2021) and has been slightly modified to insert gender as a blocking variable. In Figure 1, we have five correct links (solid arrows) and two false links (dashed arrows).…”
Section: Regression Using Linked Datamentioning
confidence: 99%
See 1 more Smart Citation
“…Because the possible matches require manual review which is sometimes not available, Grannis et al (2003) propose to establish only a single threshold to avoid human review. Although the matching scores and the posterior probabilities produce the same ordering for record pairs (Larsen & Rubin, 2001), the posterior probabilities are preferable in our case because they may be useful for further analyses (Lahiri & Larsen, 2005;Kim & Chambers, 2012;Hof & Zwinderman, 2012;Zhang & Tuoto, 2020).…”
Section: Probabilistic Record Linkagementioning
confidence: 99%
“…A setting specifically considered in Chambers and da Silva (2020) and Kim and Chambers (2012) is termed exchangeable linkage error ( ELE ), in which the off‐diagonal elements of Qb are constant (and hence all diagonal elements are equal to a complementary constant). Novel insights into the ELE setting are recently presented in Zhang and Tuoto (2020).…”
Section: Linear Regression With Linked Datasetsmentioning
confidence: 99%