2014
DOI: 10.1002/sim.6230
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating latent class models with conditional dependence in record linkage

Abstract: Record linkage methods commonly use a traditional latent class model to classify record pairs from different sources as true matches or non-matches. This approach was first formally described by Fellegi and Sunter and assumes that the agreement in fields is independent conditional on the latent class. Consequences of violating the conditional independence assumption include bias in parameter estimates from the model. We sought to further characterize the impact of conditional dependence on the overall misclass… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
7

Relationship

3
4

Authors

Journals

citations
Cited by 13 publications
(10 citation statements)
references
References 32 publications
(65 reference statements)
0
10
0
Order By: Relevance
“…The conditional independence assumption can be relaxed by extending the model to add correlated errors, as and when there are extra df available for fitting, or possibly by assuming correlation values to be known, 28‐30 by making alternative assumptions, 31 or by adopting Bayesian methods 32‐34 . In fact, under the reasonable assumption of exchangeability of the repeated observations (ie, whenever their order is not relevant), it should be sufficient to add just one extra parameter to represent the correlation between any given pair of tests on an individual.…”
Section: Discussionmentioning
confidence: 99%
“…The conditional independence assumption can be relaxed by extending the model to add correlated errors, as and when there are extra df available for fitting, or possibly by assuming correlation values to be known, 28‐30 by making alternative assumptions, 31 or by adopting Bayesian methods 32‐34 . In fact, under the reasonable assumption of exchangeability of the repeated observations (ie, whenever their order is not relevant), it should be sufficient to add just one extra parameter to represent the correlation between any given pair of tests on an individual.…”
Section: Discussionmentioning
confidence: 99%
“…This analysis used four manually curated gold-standard analytic datasets, which contained a random selection of true-positive and true-negative matches, [17][18][19][20][21][22][23]. Through the manual curation process, we know the true match status for each record, which is a significant advantage over databases where the true matches are unknown.…”
Section: Datasets and Use Casesmentioning
confidence: 99%
“…In Table 3, we summarize the parameter estimates from the aforementioned manually reviewed random sample, from F-S model, a log-linear model with interactions between telephone number and zip code in both classes with different coefficients 20 (LLy), a log-linear model with all 42 (2C 7 2 ) class-specific pairwise two-way interactions among the seven fields (LL-all), and a log-linear model with two-way interactions that met the nominal significance level of 0.05 (LL-auto). The AUCs of all methods are similar, all above 99%.…”
Section: De-duplication Of a Single File Of Patient Recordsmentioning
confidence: 99%
“…In addition, estimation of the GRE models is computationally intensive and relies heavily on the choice of starting values. 7 Winkler used three-way interactions, which requires at least 10 fields. 8 Xu and Craig 5 also employed a correlation plot which can be useful in exploring and visualizing the correlations among fields, which can guide the modeling and serve as a simple diagnostic tool for identifying the conditional dependence.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation