The Impact of Name-Matching and Blocking on Author Disambiguation

Backes, Tobias

doi:10.1145/3269206.3271699

Cited by 19 publications

(26 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For this, a new disambiguation method may be evaluated under initialized versus full forename settings followed by feature importance assessment or in comparison with results disambiguated by string‐based matching. Especially, the latter suggestion supports the idea that string‐based matching results need to be baselines in evaluating author name disambiguation (Backes, ).…”

Section: Conclusion and Discussionmentioning

confidence: 54%

Effect of forename string on author name disambiguation

Kim

2019

Asso for Info Science & Tech

View full text Add to dashboard Cite

In author name disambiguation, author forenames are used to decide which name instances are disambiguated together and how much they are likely to refer to the same author. Despite such a crucial role of forenames, their effect on the performance of heuristic (string matching) and algorithmic disambiguation is not well understood. This study assesses the contributions of forenames in author name disambiguation using multiple labeled data sets under varying ratios and lengths of full forenames, reflecting real-world scenarios in which an author is represented by forename variants (synonym) and some authors share the same forenames (homonym). The results show that increasing the ratios of full forenames substantially improves both heuristic and machine-learning-based disambiguation. Performance gains by algorithmic disambiguation are pronounced when many forenames are initialized or homonyms are prevalent. As the ratios of full forenames increase, however, they become marginal compared to those by string matching. Using a small portion of forename strings does not reduce much the performances of both heuristic and algorithmic disambiguation methods compared to using full-length strings.These findings provide practical suggestions, such as restoring initialized forenames into a full-string format via record linkage for improved disambiguation performances.

show abstract

Section: Conclusion and Discussionmentioning

confidence: 54%

Effect of forename string on author name disambiguation

Kim

2019

Asso for Info Science & Tech

View full text Add to dashboard Cite

show abstract

“…8,[17][18][19][24][25][26][27] While the others try to utilize multiple heterogeneous information for user account alignment, such as user the social relations, 6,7,9,10,[12][13][14][15]28 user interests, 4,7,10,29 user temporal distribution features. 4,6,11,28,30 To most of the studies on user account alignment, 4,8,[12][13][14][15][16][17]19,[24][25][26][27] the account name information is very important, since many users like to assign their accounts in different networks with very similar names, and the account names in most networks are very easy to be acquired. And how to properly utilize the name information in the alignment of accounts owned by the English users have already been well studied by many works, 8,16,17,19,24,26,27 among them: Vosecky et al …”

Section: Related Workmentioning

confidence: 99%

“…4,6,11,28,30 To most of the studies on user account alignment, 4,8,[12][13][14][15][16][17]19,[24][25][26][27] the account name information is very important, since many users like to assign their accounts in different networks with very similar names, and the account names in most networks are very easy to be acquired. And how to properly utilize the name information in the alignment of accounts owned by the English users have already been well studied by many works, 8,16,17,19,24,26,27 among them: Vosecky et al 31 propose a method that based on web profile matching to connect users between Facebook and StudiVZ. In their study, they compare three kinds of name matching algorithms, and select the best one for profile matching.…”

Section: Related Workmentioning

confidence: 99%

“…Although different ways have been explored to apply the name information matching to the cross-network alignment of user accounts, however, most of them just focus on connecting the accounts of users who mainly use English and create English names (in this article, these users are referred as English users). 4,8,[17][18][19]23,24 Since Chinese users' behavioral models are quite different from English users' when creating the account names, the matching of Chinese name information may encounter some new problems. For example, a Chinese user may use Chinese letters to create his/her account name(s) on Sina Weibo, but use English letters to create his/her Twitter name(s).…”

mentioning

confidence: 99%

See 1 more Smart Citation

A multiview approach based on naming behavioral modeling for aligning chinese user accounts across multiple networks

Zhu

Wang

Liu

et al. 2020

Concurrency and Computation

View full text Add to dashboard Cite

Hundreds of millions of Chinese people have become social network users in recent years, and aligning the accounts of common Chinese users across multiple social networks is valuable to many inter-network applications, for example, cross-network recommendation and cross-network link prediction. Many methods have explored the proper ways of utilizing account name information into aligning the common English users' accounts. However, how to properly utilize the account name information when aligning the Chinese user accounts remains to be detailedly studied. In this article, we first discuss the available naming behavioral models as well as the related features for different types of Chinese account name matchings. Second, we propose the framework of Multi-View Cross-Network User Alignment (MCUA) method, which uses a multi-view framework to creatively integrate different models to deal with different types of Chinese account name matchings, and can consider all of the studied features when aligning the Chinese user accounts. Finally, we conduct experiments to prove that MCUA can outperform many existing methods on aligning Chinese user accounts between Sina Weibo and Twitter. Besides, we also study the best learning models and the top-k valuable features of different types of name matchings for MCUA over our experimental datasets. K E Y W O R D S account name, aligning Chinese user accounts, multiview framework, multiple social networks 1 INTRODUCTION Online social networks are highly developed in recent years, 1 and hundreds of millions of Chinese people have become social network users. * Different social networks may provide different services, so it is natural for individuals to use multiple social networks for different purposes at the same time. 2 For example, a Chinese student may use Renren † to share funny photos with his classmates, use Sina Weibo ‡ to follow the latest events, and use Twitter § to connect with international friends. However, the accounts owned by the same user in different social sites are mostly isolated without any correspondence connections to each other. 3 Aligning the accounts of common users across different social networks is of great value to many concrete real-world internetwork applications. 3-6 For example, we can recommend new friends or new topics to a new Twitter user according to the social relationship information * Calculated by China Internet Network Information Center(CNNIC), the Statistics Report on China Internet Developing Situation(2015).

show abstract

“…Cast in this light, many existing clustering models are not very suitable for the author name disambiguation problem. Meanwhile, cost-effective blocking technique [1] and lightweight rule-based methods [2,22] are worthy of research as they have been proven to achieve convincing precision in this problem.…”

Section: Introductionmentioning

confidence: 99%

Strong Baselines for Author Name Disambiguation with and Without Neural Networks

Zhang

Liu

et al. 2020

Advances in Knowledge Discovery and Data Mining

View full text Add to dashboard Cite

Author name disambiguation (AND) is one of the most vital problems in scientometrics, which has become a great challenge with the rapid growth of academic digital libraries. Existing approaches for this task substantially rely on complex clustering-like architectures, and they usually assume the number of clusters is known beforehand or predict the number by applying another model, which involve increasingly complex and time-consuming architectures. In this paper, we combine simple neural networks with two sets of heuristic rules to explore strong baselines for the author name disambiguation problem without any priori knowledge or estimation about cluster size, which frees the model from unnecessary complexity. On a popular benchmark dataset AMiner, our solution significantly outperforms several state-of-the-art methods both in performance and efficiency, and it still achieves comparable performance with many complex models when only using a group of rules. Experimental results also indicate that gains from sophisticated deep learning techniques are quite modest in the author name disambiguation problem.

show abstract

The Impact of Name-Matching and Blocking on Author Disambiguation

Cited by 19 publications

References 19 publications

Effect of forename string on author name disambiguation

Effect of forename string on author name disambiguation

A multiview approach based on naming behavioral modeling for aligning chinese user accounts across multiple networks

Strong Baselines for Author Name Disambiguation with and Without Neural Networks

Contact Info

Product

Resources

About