Proceedings of the 2005 ACM Symposium on Applied Computing 2005
DOI: 10.1145/1066677.1066920
|View full text |Cite
|
Sign up to set email alerts
|

A hierarchical naive Bayes mixture model for name disambiguation in author citations

Abstract: Because of name variations, an author may have multiple names and multiple authors may share the same name. Such name ambiguity affects the performance of document retrieval, web search, database integration, and may cause improper attribution to authors. This paper presents a hierarchical naive Bayes mixture model, an unsupervised learning approach, for name disambiguation in author citations. This method partitions a collection of citations 1 into clusters, with each cluster containing only citations authore… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
85
0
2

Year Published

2007
2007
2017
2017

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 77 publications
(90 citation statements)
references
References 16 publications
0
85
0
2
Order By: Relevance
“…They use a mix of techniques. While some use similarity functions [2,7,12,18,21,27,30], others use learning techniques [1,14,16,28,32,35], heuristics [17,19,20,24], classifiers [9,10,34] and clustering methods [11,31].…”
Section: Background and Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…They use a mix of techniques. While some use similarity functions [2,7,12,18,21,27,30], others use learning techniques [1,14,16,28,32,35], heuristics [17,19,20,24], classifiers [9,10,34] and clustering methods [11,31].…”
Section: Background and Related Workmentioning
confidence: 99%
“…If no tuple is returned found, we attempt to retrieve the coauthor by the full name (lines 6-7), which is returned in case it is found (line 9). If there is no success and if the coauthor's name contains a period and/or a semicolon (which characterizes a citation name) (line 10), the heuristic tries to find the coauthor's CV by using the citation name (lines [11][12][13][14]. Since this query may retrieve several tuples, we use a similarity function to find the most similar coauthor in the database (lines 15-24).…”
Section: Heuristic Matching Algorithmmentioning
confidence: 99%
“…We can apply data mining classification methods, for example Bayes methods [23,18], decision trees [31] or SVM [7,11]. Unsupervised learning methods such as latent Dirichlet allocation [3] or clustering methods can also be used, if there is no training data.…”
Section: Related Workmentioning
confidence: 99%
“…To resolve this problem, some relational information is used to facilitate the disambiguation task. For example, Han et al [12] try to improve disambiguation accuracy by clustering title words and venue words with similar concepts. Song et al [13] introduce the relationships between authors and topics in citations to improve the disambiguation accuracy by extracting the wordbased relationships for each topic.…”
Section: Related Workmentioning
confidence: 99%