2011
DOI: 10.1007/s10115-011-0397-1
|View full text |Cite
|
Sign up to set email alerts
|

Scalable clustering methods for the name disambiguation problem

Abstract: When non-unique values are used as the identifier of entities, due to their homonym, confusion can occur. In particular, when (part of) "names" of entities are used as their identifier, the problem is often referred to as a name disambiguation problem, where goal is to sort out the erroneous entities due to name homonyms (e.g., If only last name is used as the identifier, one cannot distinguish "Masao Obama" from "Norio Obama"). In this paper, in particular, we study the scalability issue of the name disambigu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
8
0

Year Published

2012
2012
2018
2018

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 17 publications
(8 citation statements)
references
References 32 publications
0
8
0
Order By: Relevance
“…Two multi-level scalable algorithms for AND are proposed by On et al . (2012). First is a multi-level graph partitioning (MGP) algorithm, and the second is a multi-level graph partitioning and merging (MGPM) algorithm.…”
Section: Selective Author Name Disambiguation Techniquesmentioning
confidence: 99%
“…Two multi-level scalable algorithms for AND are proposed by On et al . (2012). First is a multi-level graph partitioning (MGP) algorithm, and the second is a multi-level graph partitioning and merging (MGPM) algorithm.…”
Section: Selective Author Name Disambiguation Techniquesmentioning
confidence: 99%
“…When the number of clusters is known beforehand, the person name disambiguation task can be formulated as a classification problem. For recent work, see [16,18].…”
Section: Related Workmentioning
confidence: 99%
“…In addition, supervised systems need a significant amount of annotated corpora in order to estimate the model parameters. On et al [20] describe a work to disambiguate person names in digital libraries by using clustering methods based on graph partitioning algorithms that was trained with a subset of DBLP records and a set of web pages previously annotated. The cost of annotating data turns into a limitation when we approach a new language, a new domain, or a different set of classes.…”
Section: Related Workmentioning
confidence: 99%