2013
DOI: 10.1186/1472-6947-13-64
|View full text |Cite
|
Sign up to set email alerts
|

The effect of data cleaning on record linkage quality

Abstract: Background: Within the field of record linkage, numerous data cleaning and standardisation techniques are employed to ensure the highest quality of links. While these facilities are common in record linkage software packages and are regularly deployed across record linkage units, little work has been published demonstrating the impact of data cleaning on linkage quality. Methods: A range of cleaning techniques was applied to both a synthetically generated dataset and a large administrative dataset previously l… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
36
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 54 publications
(37 citation statements)
references
References 22 publications
(28 reference statements)
1
36
0
Order By: Relevance
“…We also assume appropriate data cleaning occurs before the protocol is run. Various data-cleaning techniques, such phonetic encoding algorithms, have been proposed in the literature [58]. …”
Section: A Secure Deduplication Protocolmentioning
confidence: 99%
“…We also assume appropriate data cleaning occurs before the protocol is run. Various data-cleaning techniques, such phonetic encoding algorithms, have been proposed in the literature [58]. …”
Section: A Secure Deduplication Protocolmentioning
confidence: 99%
“…These are always prone to errors, for example by typing or optical character recognition errors or by changes in values such as name changes after marriage or changes of addresses. Slightly different ways of spelling names, for example the inclusion or exclusion of academic or generational titles or other name suffixes, usually do not strongly affect record linkage procedures with unencrypted identifiers (for an example, see Randall et al 2013). However, if the identifiers are encrypted with HMACs, even small variations will result in completely different encodings after encryption.…”
Section: Linking With Encrypted Quasi-identifiersmentioning
confidence: 99%
“…A recent study by Randall et al . tested a range of data‐cleaning techniques in real and simulated data and showed that the overall linkage quality was mostly unaffected …”
Section: Discussionmentioning
confidence: 99%