2020
DOI: 10.1186/s12911-020-01285-w
|View full text |Cite
|
Sign up to set email alerts
|

CIDACS-RL: a novel indexing search and scoring-based record linkage system for huge datasets with high accuracy and scalability

Abstract: Background Record linkage is the process of identifying and combining records about the same individual from two or more different datasets. While there are many open source and commercial data linkage tools, the volume and complexity of currently available datasets for linkage pose a huge challenge; hence, designing an efficient linkage tool with reasonable accuracy and scalability is required. Methods We developed CIDACS-RL (Centre… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
53
0
2

Year Published

2020
2020
2024
2024

Publication Types

Select...
10

Relationship

6
4

Authors

Journals

citations
Cited by 45 publications
(61 citation statements)
references
References 44 publications
0
53
0
2
Order By: Relevance
“…Since there is no unique identifier in the Brazilian Information System, we linked SINASC live births records with deaths registered in SIM using the name of the mother, maternal date of birth or age (when date of birth was missing), and the municipality of residence of the mother as matching variables. The linkage was performed with CIDACS-RL-Record Linkage [ 17 ], a novel record-linkage tool developed to link large-scale administrative Brazilian datasets. Linkage procedures were conducted at the centre in a strict data protection environment and according to ethical and legal rules [ 18 ].…”
Section: Methodsmentioning
confidence: 99%
“…Since there is no unique identifier in the Brazilian Information System, we linked SINASC live births records with deaths registered in SIM using the name of the mother, maternal date of birth or age (when date of birth was missing), and the municipality of residence of the mother as matching variables. The linkage was performed with CIDACS-RL-Record Linkage [ 17 ], a novel record-linkage tool developed to link large-scale administrative Brazilian datasets. Linkage procedures were conducted at the centre in a strict data protection environment and according to ethical and legal rules [ 18 ].…”
Section: Methodsmentioning
confidence: 99%
“…The linkage process between the 100 Million Brazilian Cohort baseline dataset and BFP payroll dataset was deterministic, based on the NIS (Social Identification Number or “Número de Identificação Socia”) number—a unique identifier similar to a social security number. The linkage between the 100 Million Brazilian Cohort baseline dataset, SIM, and SINASC was performed by similarity matching using CIDACS-RL, an open-source linkage algorithm from the Center for Data and Knowledge Integration for Health (CIDACS) that generates a similarity score on the basis of several identifiers [ 29 ]; the linked records were verified through manual analysis of a sample of 2,000 randomly selected pairs from all possible paired records (Table A in S2 Text ).…”
Section: Methodsmentioning
confidence: 99%
“…A recent Brazilian study also applied a blocking strategy with Apache Lucene to reduce the number of comparisons in the subsequent steps. 14 …”
Section: Discussionmentioning
confidence: 99%