A Graph Combination With Edge Pruning‐Based Approach for Author Name Disambiguation

Pooja, Km; Mondal, Samrat; Chandra, Joydeep

doi:10.1002/asi.24212

Cited by 14 publications

(3 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In Km et al (2020), the authors present a graph-based approach where two graphs are combined together, a person-person graph obtained by connecting papers with shared coauthors, and a document-document graph which models similarity between publications' content. The document-document graph is obtained by first modeling abstract keywords with TF-IDF vectors and by drawing an edge between two nodes of the graphs when their similarity is higher than a selected threshold; subsequently, this graph is pruned by removing connections between papers whose shared referenced works are below a certain threshold.…”

Section: Graph-based Approachesmentioning

confidence: 99%

A knowledge graph embeddings based approach for author name disambiguation using literals

et al. 2022

View full text Add to dashboard Cite

Scholarly data is growing continuously containing information about the articles from a plethora of venues including conferences, journals, etc. Many initiatives have been taken to make scholarly data available in the form of Knowledge Graphs (KGs). These efforts to standardize these data and make them accessible have also led to many challenges such as exploration of scholarly articles, ambiguous authors, etc. This study more specifically targets the problem of Author Name Disambiguation (AND) on Scholarly KGs and presents a novel framework, Literally Author Name Disambiguation (LAND), which utilizes Knowledge Graph Embeddings (KGEs) using multimodal literal information generated from these KGs. This framework is based on three components: (1) multimodal KGEs, (2) a blocking procedure, and finally, (3) hierarchical Agglomerative Clustering. Extensive experiments have been conducted on two newly created KGs: (i) KG containing information from Scientometrics Journal from 1978 onwards (OC-782K), and (ii) a KG extracted from a well-known benchmark for AND provided by AMiner (AMiner-534K). The results show that our proposed architecture outperforms our baselines of 8–14% in terms of F1 score and shows competitive performances on a challenging benchmark such as AMiner. The code and the datasets are publicly available through Github (https://github.com/sntcristian/and-kge) and Zenodo (https://doi.org/10.5281/zenodo.6309855) respectively.

show abstract

Section: Graph-based Approachesmentioning

confidence: 99%

A knowledge graph embeddings based approach for author name disambiguation using literals

et al. 2022

View full text Add to dashboard Cite

show abstract

“…Some errors discussed in the literature during the last years range from problems in transcribing large document collections [1] to namesake alias, homonymy or polysemy (when the same name corresponds to multiple authors), and name variability or synonymy (when an author appears under different names) [8]. Other common issues reported include missing identifiers, lack of standardized schemas, and inconsistencies in data representation [1,9].…”

Section: Related Work and Backgroundmentioning

confidence: 99%

AuthCrowd: Author Name Disambiguation and Entity Matching using Crowdsourcing

Correia

Guimarães

Paulino

et al. 2021

2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design (CSCWD)

View full text Add to dashboard Cite

Despite decades of research and development in named entity resolution, dealing with name ambiguity is still a challenging issue for many bibliometric-enhanced information retrieval (IR) tasks. As new bibliographic datasets are created as a result of the upward growth of publication records worldwide, more problems arise when considering the effects of errors resulting from missing data fields, duplicate entities, misspellings, extra characters, etc. As these concerns tend to be of large-scale, both the general consistency and the quality of electronic data are largely affected. This paper presents an approach to handle these name ambiguity problems through the use of crowdsourcing as a complementary means to traditional unsupervised approaches. To this end, we present "AuthCrowd", a crowdsourcing system with the ability to decompose named entity disambiguation and entity matching tasks. Experimental results on a real-world dataset of publicly available papers published in peer-reviewed venues demonstrate the potential of our proposed approach for improving author name disambiguation. The findings further highlight the importance of adopting hybrid crowd-algorithm collaboration strategies, especially for handling complexity and quantifying bias when working with large amounts of data.

show abstract

“…GHOST [ 6 ] utilizes a coauthorship graph to compute the similarity between node pairs, utilizing the attribute of coauthor only could achieve the same performance as previous complicated approach. A combined graph encompassing the author-author graph and document-document graph is put forward by Pooja [ 22 ]. Each connected component of the combined graph represents a distinct cluster.…”

Section: Related Workmentioning

confidence: 99%

A Graph-Based Author Name Disambiguation Method and Analysis via Information Theory

2020

Entropy

View full text Add to dashboard Cite

Name ambiguity, due to the fact that many people share an identical name, often deteriorates the performance of information integration, document retrieval and web search. In academic data analysis, author name ambiguity usually decreases the analysis performance. To solve this problem, an author name disambiguation task is designed to divide documents related to an author name reference into several parts and each part is associated with a real-life person. Existing methods usually use either attributes of documents or relationships between documents and co-authors. However, methods of feature extraction using attributes cause inflexibility of models while solutions based on relationship graph network ignore the information contained in the features. In this paper, we propose a novel name disambiguation model based on representation learning which incorporates attributes and relationships. Experiments on a public real dataset demonstrate the effectiveness of our model and experimental results demonstrate that our solution is superior to several state-of-the-art graph-based methods. We also increase the interpretability of our method through information theory and show that the analysis could be helpful for model selection and training progress.

show abstract

A Graph Combination With Edge Pruning‐Based Approach for Author Name Disambiguation

Cited by 14 publications

References 39 publications

A knowledge graph embeddings based approach for author name disambiguation using literals

A knowledge graph embeddings based approach for author name disambiguation using literals

AuthCrowd: Author Name Disambiguation and Entity Matching using Crowdsourcing

A Graph-Based Author Name Disambiguation Method and Analysis via Information Theory

Contact Info

Product

Resources

About