2017
DOI: 10.1016/j.is.2017.06.006
|View full text |Cite
|
Sign up to set email alerts
|

A novel ensemble learning approach to unsupervised record linkage

Abstract: Record linkage is a process of identifying records that refer to the same realworld entity. Many existing approaches to record linkage apply supervised machine learning techniques to generate a classification model that classifies a pair of records as either match or non-match. The main requirement of such an approach is a labelled training dataset. In many real-world applications no labelled dataset is available hence manual labelling is required to create a sufficiently sized training dataset for a supervise… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
20
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 34 publications
(21 citation statements)
references
References 31 publications
1
20
0
Order By: Relevance
“…Those examples can be generated manually one-by-one, or by leveraging tools like Snorkel. [94] generates an ensemble of automatic self-learning models that use different similarity measures. To enhance the automatic selflearning process, it incorporates attribute weighting into the automatic seed selection for each of the self-learning models.…”
Section: Supervised Learning Adaptive Matchingmentioning
confidence: 99%
“…Those examples can be generated manually one-by-one, or by leveraging tools like Snorkel. [94] generates an ensemble of automatic self-learning models that use different similarity measures. To enhance the automatic selflearning process, it incorporates attribute weighting into the automatic seed selection for each of the self-learning models.…”
Section: Supervised Learning Adaptive Matchingmentioning
confidence: 99%
“…Table X shows a comparison between the best F1 result obtained by the proposed framework and the best F1 result of other approaches. Junk et al [75] applies different classifiers over four textual datasets not related to the pharmaceutical domain, obtaining a F1 measure of 0,96. The work of Kim & Giles [76] is based on a financial dataset and obtain a F1 measure of 0,9774 with Random Forest in the best scenario.…”
Section: Analysis Of Means Plot Include the Upper Decision Limit (Udlmentioning
confidence: 99%
“…The main motivation of this research was the necessity of great pharmaceutical manufacturers to analyse a huge number of products related to their worldwide activities, considering that [75] 0.96 Kim & Giles (2016) [76] 0.9744 Proposed SVM 0.85 the same product can be registered several times by different systems using different attributes. The task of finding the records and match the products cannot be done by a human in a reasonable way, because the number of records to be matched is extremely high.…”
Section: Conclusion and Future Linesmentioning
confidence: 99%
“…In more recent work the authors proposed to address the problem of unsupervised record linkage using graphical models [14] and multi view ensemble self-learning [15]. Discussion.…”
Section: Related Workmentioning
confidence: 99%