2012
DOI: 10.1007/978-3-642-33460-3_42
|View full text |Cite
|
Sign up to set email alerts
|

Author Name Disambiguation Using a New Categorical Distribution Similarity

Abstract: Abstract. Author name ambiguity has been a long-standing problem which impairs the accuracy of publication retrieval and bibliometric methods. Most of the existing disambiguation methods are built on similarity measures, e.g., "Jaccard Coefficient", between two sets of papers to be disambiguated, each set represented by a set of categorical features, e.g., coauthors and published venues 1 . Such measures perform bad when the two sets are small, which is typical in Author Name Disambiguation. In this paper, we … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
17
0

Year Published

2013
2013
2019
2019

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 24 publications
(17 citation statements)
references
References 12 publications
0
17
0
Order By: Relevance
“…The importance of the three attributes is decided by their impact on the name ambiguity problem. First, as a scholar usually has several related research fields and collaborates with several comparatively stable researchers in each field, coauthors usually provides stronger evidence than the other attributes (Cota, Ferreira, Nascimento, Gonçalves, & Laender, ; Fan, Wang, Pu, Zhou, & Lv, ; Kang et al., ; Li et al., ; Tang et al., ). We extract coauthors from article set and cluster the articles by coauthorship in the first step.…”
Section: A Fast Methods Based On Multiple Clustering For Name Disambigmentioning
confidence: 99%
See 1 more Smart Citation
“…The importance of the three attributes is decided by their impact on the name ambiguity problem. First, as a scholar usually has several related research fields and collaborates with several comparatively stable researchers in each field, coauthors usually provides stronger evidence than the other attributes (Cota, Ferreira, Nascimento, Gonçalves, & Laender, ; Fan, Wang, Pu, Zhou, & Lv, ; Kang et al., ; Li et al., ; Tang et al., ). We extract coauthors from article set and cluster the articles by coauthorship in the first step.…”
Section: A Fast Methods Based On Multiple Clustering For Name Disambigmentioning
confidence: 99%
“…FMC consists of three steps: first, given the article list by authors with the same name, FMC groups articles into small clusters, also called fragments, by coauthorship where each cluster represents the articles of one author; second, FMC continues to cluster the fragments obtained from the previous step by correlating titles to reduce the number of fragments and increase the number of articles in the fragments; finally, the algorithm further tunes the clustering via the latent relations among venues, where the articles written by authors with the same name have been grouped together under the actual authors. Our experimental results show that FMC gets the best pairwise F1 score among four algorithms and reduces the runtime by 10 to 100 times as compared with the second‐best algorithm Categorical Sampling Likelihood Ratio (CSLR) (Li, Cong, & Miao, ).…”
Section: Introductionmentioning
confidence: 94%
“…There are several researches conducted on mixed citation problem [4], and other ones deal with split citation problem [5]. Some approaches treat the both cases as in [15].…”
Section: Related Workmentioning
confidence: 99%
“…In [3] the authors proposes two attributes, the topic and the web correlation where measure the topic similarity across two citations and the use the web to detect whether two citations are listed together under the same web page which may flag a strong indicator that they belong to the same individual. In [4,5] the authors considers three features: co-authors, published venue and title where in categorical the authors investigate a categorical distribution similarity method to find the similarity among these features, and in k-way the authors used k-way spectral clustering for grouping the similar citations. In [6], the authors also used co-authors, title of publications in addition to the title of the proceedings or journal and the employ two supervised learning methods, SVM and naïve Bayes to classify two citations.…”
Section: Related Workmentioning
confidence: 99%
“…The problem is well known in the field of database and artificial intelligence. This problem is known with various other names like merge/purge problem [1], object distinction [2], author name ambiguity [3], temporal record linkage [4], [5], etc. Researchers have proposed various approaches to tackle this problem.…”
Section: Introductionmentioning
confidence: 99%