Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval 2013
DOI: 10.1145/2484028.2484157
|View full text |Cite
|
Sign up to set email alerts
|

Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion

Abstract: Entity disambiguation is an important step in many information retrieval applications. This paper proposes new research for entity disambiguation with the focus of name disambiguation in digital libraries. In particular, pairwise similarity is first learned for publications that share the same author name string (ANS) and then a novel Hierarchical Agglomerative Clustering approach with Adaptive Stopping Criterion (HACASC) is proposed to adaptively cluster a set of publications that share a same ANS to individu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
25
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 41 publications
(25 citation statements)
references
References 15 publications
0
25
0
Order By: Relevance
“…For each name dataset, they calculate a Gram matrix representing similarities between different citations and apply K-way spectral clustering algorithm on the Gram matrix to obtain the desired clusters of the citations. In another unsupervised approach, Cen et al [5] compute pairwise similarity for publication events that share the same author name string (ANS) and then use a novel hierarchical agglomerative clustering with adaptive stopping criterion (HACASC) to partition the publications in different author clusters. Malin [17] proposes another clusterbased method that uses social network structure.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…For each name dataset, they calculate a Gram matrix representing similarities between different citations and apply K-way spectral clustering algorithm on the Gram matrix to obtain the desired clusters of the citations. In another unsupervised approach, Cen et al [5] compute pairwise similarity for publication events that share the same author name string (ANS) and then use a novel hierarchical agglomerative clustering with adaptive stopping criterion (HACASC) to partition the publications in different author clusters. Malin [17] proposes another clusterbased method that uses social network structure.…”
Section: Related Workmentioning
confidence: 99%
“…Existing works mostly use biographical features, such as name, address, institutional affiliation, email address, and homepage; contextual features, such as coauthor/collaborator, and research keywords; and external data such as Wikipedia [7]. From methodological point of view, some of the works follow a supervised learning approach [8,10], while others use unsupervised clustering [5,9,17,25]. There exist quite a few solutions that use graphical models [3,23,26,31].…”
mentioning
confidence: 99%
“…We use LDA (Blei, Ng, and Jordan 2003), HC (Chang, Pei, and Chen 2014) and STM (Wang et al 2015) as baselines. We do not compare with non-text feature-based models (Tang et al 2012;Cen et al 2013) because our goal is to compare sense topic models on a task where the sense granularities are more varied. For STM and AutoSense, the title, publication venue and the author names are used as local contexts while the abstract is used as the global context.…”
Section: Methodsmentioning
confidence: 99%
“…Due to its importance, the name disambiguation task has attracted substantial a ention from information retrieval and data mining communities. However, the majority of existing solutions [1,3,12,15] for this task use biographical features such as name, address, institutional affiliation, email address, and homepage. Also, contextual features such as collaborator, community affiliation, and external data source such as Wikipedia are used in some works [13,15].…”
Section: Introductionmentioning
confidence: 99%