Duplicate Record Detection: A Survey

Elmagarmid,; Ipeirotis, Panagiotis G.; Verykios, Vassilios S.

doi:10.1109/tkde.2007.250581

Cited by 1,336 publications

(1,079 citation statements)

References 73 publications

Supporting

Mentioning

1,027

Contrasting

Unclassified

Order By: Relevance

“…Surveys [8,9]. review the various approaches, including named attributes computations [5], schema mapping [2,17] and duplicate detection in hierarchical data [10], all which inform the construction of profile linkage techniques.…”

Section: Record Linkage and Entity Resolutionmentioning

confidence: 99%

Online Social Network Profile Linkage

Zhang

Kan

Liu

et al. 2014

Information Retrieval Technology

View full text Add to dashboard Cite

Abstract. Piecing together social signals from people in different online social networks is key for downstream analytics. However, users may have different usernames in different social networks, making the linkage task difficult. To enable this, we explore a probabilistic approach that uses a domain-specific prior knowledge to address this problem of online social network user profile linkage. At scale, linkage approaches that are based on a naïve pairwise comparisons that have quadratic complexity become prohibitively expensive. Our proposed threshold-based canopying framework -named OPL -reduces this pairwise comparisons, and guarantees a upper bound theoretic linear complexity with respect to the dataset size. We evaluate our approaches on real-world, large-scale datasets obtained from Twitter and Linkedin. Our probabilistic classifier integrating prior knowledge into Naïve Bayes performs at over 85% F1-measure for pairwise linkage, comparable to state-of-the-art approaches.

show abstract

Section: Record Linkage and Entity Resolutionmentioning

confidence: 99%

Online Social Network Profile Linkage

Zhang

Kan

Liu

et al. 2014

Information Retrieval Technology

View full text Add to dashboard Cite

show abstract

“…Entity Linkage is the process that decides whether two descriptions refer to the same real world entity (see [12] for an overview). Actually, state-of-the-art methods from this area have also been reused and adapted in implementing entity search.…”

Section: Entity Searchmentioning

confidence: 99%

From Web Data to Entities and Back

Miklós

Bonvin

Bouquet

et al. 2010

Notes on Numerical Fluid Mechanics and Multidisciplinary Design

View full text Add to dashboard Cite

Abstract. We present the Entity Name System (ENS), an enabling infrastructure, which can host descriptions of named entities and provide unique identifiers, on large-scale. In this way, it opens new perspectives to realize entity-oriented, rather than keyword-oriented, Web information systems. We describe the architecture and the functionality of the ENS, along with tools, which all contribute to realize the Web of entities.

show abstract

“…PowerMap uses the Watson 5 semantic search engine as a gateway to the SW. In addition, PowerMap can also query its own repositories and offers the capability to index and add new online ontologies 6 . In the third step, the Triple Similarity Service (TSS) matches the QTs to ontological expressions.…”

Section: Motivating Scenario: Question Answering On the Semantic Webmentioning

confidence: 99%

“…Basic similarity metrics based on string comparison were developed in the database community (e.g., [16,3]). These metrics are used as a basis for the majority of algorithms, which compare values of attributes of different data instances and aggregate them to make a decision about two instances referring to the same entity (see [6] for a survey). The main distinction of our work is that, in the PowerAqua scenario, the fusion of answers is done in real time.…”

Section: Related Workmentioning

confidence: 99%

Merging and Ranking Answers in the Semantic Web: The Wisdom of Crowds

et al. 2009

View full text Add to dashboard Cite

Abstract. In this paper we propose algorithms for combining and ranking answers from distributed heterogeneous data sources in the context of a multi-ontology Question Answering task. Our proposal includes a merging algorithm that aggregates, combines and filters ontology-based search results and three different ranking algorithms that sort the final answers according to different criteria such as popularity, confidence and semantic interpretation of results. An experimental evaluation on a large scale corpus indicates improvements in the quality of the search results with respect to a scenario where the merging and ranking algorithms were not applied. These collective methods for merging and ranking allow to answer questions that are distributed across ontologies, while at the same time, they can filter irrelevant answers, fuse similar answers together, and elicit the most accurate answer(s) to a question.

show abstract

Duplicate Record Detection: A Survey

Cited by 1,336 publications

References 73 publications

Online Social Network Profile Linkage

Online Social Network Profile Linkage

From Web Data to Entities and Back

Merging and Ranking Answers in the Semantic Web: The Wisdom of Crowds

Contact Info

Product

Resources

About