2010
DOI: 10.1002/asi.21363
|View full text |Cite
|
Sign up to set email alerts
|

An unsupervised heuristic‐based hierarchical method for name disambiguation in bibliographic citations

Abstract: Name ambiguity in the context of bibliographic citations is a difficult problem which, despite the many efforts from the research community, still has a lot of room for improvement. In this article, we present a heuristic-based hierarchical clustering method to deal with this problem. The method successively fuses clusters of citations of similar author names based on several heuristics and similarity measures on the components of the citations (e.g., coauthor names, work title, and publication venue title). D… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
125
0
3

Year Published

2010
2010
2017
2017

Publication Types

Select...
7
1

Relationship

2
6

Authors

Journals

citations
Cited by 92 publications
(134 citation statements)
references
References 31 publications
1
125
0
3
Order By: Relevance
“…• K [8]: Combines the average similarity of clusters in R with respect to those in S and the average similarity of clusters in S with respect to those in R.…”
Section: Existing Measuresmentioning
confidence: 99%
See 1 more Smart Citation
“…• K [8]: Combines the average similarity of clusters in R with respect to those in S and the average similarity of clusters in S with respect to those in R.…”
Section: Existing Measuresmentioning
confidence: 99%
“…The K measure [8,2] sums the similarities of all cluster pairs and is defined as the geometric mean of the Average Cluster Purity (ACP) and the Average Author Purity (AAP). (Here, Author can be thought of as a cluster in the gold standard.)…”
Section: A2 Cluster-level Comparisonmentioning
confidence: 99%
“…They use a mix of techniques. While some use similarity functions [2,7,12,18,21,27,30], others use learning techniques [1,14,16,28,32,35], heuristics [17,19,20,24], classifiers [9,10,34] and clustering methods [11,31].…”
Section: Background and Related Workmentioning
confidence: 99%
“…The reasons for developing the two-stage clustering are twofold: First, coauthors generally provide stronger evidence than other features, based on which the generated cluster usually comprises of papers of the same author, but the papers of an author may distribute among multiple clusters ( [3]); Second, the venue and title features are relatively weak evidence, based on which we can further merge clusters from the same author.…”
Section: Overview Of the Clustering Proceduresmentioning
confidence: 99%
“…For each paper, we consider 3 features: coauthors, published venue and title, by following the setting used in previous work [5,3,12]. Under this setting, our proposed method can be general and applicable to the existing bibliography databases, e.g., DBLP, since they contain information on the three features for each paper.…”
Section: Introductionmentioning
confidence: 99%