2014
DOI: 10.1145/2629687
|View full text |Cite
|
Sign up to set email alerts
|

Reach for gold

Abstract: Duplicates in a database are one of the prime causes of poor data quality and are at the same time among the most difficult data quality problems to alleviate. To detect and remove such duplicates, many commercial and academic products and methods have been developed. The evaluation of such systems is usually in need of pre-classified results. Such gold standards are often expensive to come by (much manual classification is necessary), not representative (too small or too synthetic), and proprietary and thus p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
1
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 8 publications
(1 citation statement)
references
References 22 publications
0
1
0
Order By: Relevance
“…2 Other than these three types, a few studies have used synthetic labeled data (e.g.,Ferreira, Gonçalves, Almeida, Laender, & Veloso, 2012;Milojević, 2013). Another noticeable labeling approach is to use the intersection set of disambiguation results by multiple algorithms(Vogel, Heise, Draisbach, Lange, & Naumann, 2014) 3 This does not imply that onlyHan et al (2004)'s data contain flaws. No other labeled data thanHan et al (2004)'s have received such intensive scrutiny for errors.…”
mentioning
confidence: 99%
“…2 Other than these three types, a few studies have used synthetic labeled data (e.g.,Ferreira, Gonçalves, Almeida, Laender, & Veloso, 2012;Milojević, 2013). Another noticeable labeling approach is to use the intersection set of disambiguation results by multiple algorithms(Vogel, Heise, Draisbach, Lange, & Naumann, 2014) 3 This does not imply that onlyHan et al (2004)'s data contain flaws. No other labeled data thanHan et al (2004)'s have received such intensive scrutiny for errors.…”
mentioning
confidence: 99%