Name Disambiguation in Anonymized Graphs using Network Embedding

Zhang, Baichuan; Hasan, Mohammad Al

doi:10.1145/3132847.3132873

Cited by 104 publications

(97 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Since there is no implementation details, we speculate that it might be because the merging rules are too loose. By incorporating both rules and neural networks to model co-authorships, affiliations and titles explicitly, our PNP model outperforms all baselines in terms of F1score (+3.54% over Zhang et al [25], +11.75% over Zhang and Al Hasan [24], +11.23% over Louppe et al [13] and +39.74% over Fan et al [3] relatively). In the bottom half part of Table 1, some incremental results of our method are presented.…”

Section: Comparison Methodsmentioning

confidence: 98%

“…-Louppe et al [13]: It trains a pairwise distance function based on carefully designed similarity features, and uses semi-supervised Hierarchical Agglomerative Clustering (HAC) algorithm to determine clusters. -Zhang and Al Hasan [24]: It constructs graphs for each author name based on co-author and document similarity. Embeddings are learned for each name and the final results are also obtained by HAC.…”

Section: Comparison Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Strong Baselines for Author Name Disambiguation with and Without Neural Networks

Zhang

Liu

et al. 2020

Advances in Knowledge Discovery and Data Mining

View full text Add to dashboard Cite

Author name disambiguation (AND) is one of the most vital problems in scientometrics, which has become a great challenge with the rapid growth of academic digital libraries. Existing approaches for this task substantially rely on complex clustering-like architectures, and they usually assume the number of clusters is known beforehand or predict the number by applying another model, which involve increasingly complex and time-consuming architectures. In this paper, we combine simple neural networks with two sets of heuristic rules to explore strong baselines for the author name disambiguation problem without any priori knowledge or estimation about cluster size, which frees the model from unnecessary complexity. On a popular benchmark dataset AMiner, our solution significantly outperforms several state-of-the-art methods both in performance and efficiency, and it still achieves comparable performance with many complex models when only using a group of rules. Experimental results also indicate that gains from sophisticated deep learning techniques are quite modest in the author name disambiguation problem.

show abstract

Section: Comparison Methodsmentioning

confidence: 98%

Section: Comparison Methodsmentioning

confidence: 99%

Strong Baselines for Author Name Disambiguation with and Without Neural Networks

Zhang

Liu

et al. 2020

Advances in Knowledge Discovery and Data Mining

View full text Add to dashboard Cite

show abstract

“…With the development of unsupervised feature learning techniques [3], deep learning methods proved successful in natural language processing tasks through neural language models [10,33,36]. These models have been used to capture the semantic and syntactic structures of human language [8], and even logical analogies [20], by embedding words as vectors.…”

Section: Related Workmentioning

confidence: 99%

A scalable attribute-aware network embedding system

Liu

et al. 2019

Neurocomputing

View full text Add to dashboard Cite

“…Additionally, as the node embedding representations are often learned in a task-agnostic fashion, they are generalizable to a number of downstream learning tasks such as node classification [33], community detection [44], link prediction [15], and visualization [37]. On top of that, it also has broader impacts in advancing many real-world applications, ranging from recommendation [45], polypharmacy side effects prediction [53] to name disambiguation [49]. The basic idea of network embedding is to represent each node by a lowdimensional vector in which the relativity information among nodes in the original network is maximally transcribed.…”

Section: Introductionmentioning

confidence: 99%

Multi-level network embedding with boosted low-rank matrix approximation

Wu²,

Guo

et al. 2019

Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

View full text Add to dashboard Cite

As opposed to manual feature engineering which is tedious and difficult to scale, network representation learning has attracted a surge of research interests as it automates the process of feature learning on graphs. The learned lowdimensional node vector representation is generalizable and eases the knowledge discovery process on graphs by enabling various off-the-shelf machine learning tools to be directly applied. Recent research has shown that the past decade of network embedding approaches either explicitly factorize a carefully designed matrix to obtain the low-dimensional node vector representation or are closely related to implicit matrix factorization, with the fundamental assumption that the factorized node connectivity matrix is low-rank. Nonetheless, the global low-rank assumption does not necessarily hold especially when the factorized matrix encodes complex node interactions, and the resultant single low-rank embedding matrix is insufficient to capture all the observed connectivity patterns. In this regard, we propose a novel multi-level network embedding framework BoostNE, which can learn multiple network embedding representations of different granularity from coarse to fine without imposing the prevalent global low-rank assumption. The proposed BoostNE method is also in line with the successful gradient boosting method in ensemble learning as multiple weak embeddings lead to a stronger and more effective one. We assess the effectiveness of the proposed BoostNE framework by comparing it with existing state-of-the-art network embedding methods on various datasets, and the experimental results corroborate the superiority of the proposed BoostNE network embedding framework.

show abstract

Name Disambiguation in Anonymized Graphs using Network Embedding

Cited by 104 publications

References 25 publications

Strong Baselines for Author Name Disambiguation with and Without Neural Networks

Strong Baselines for Author Name Disambiguation with and Without Neural Networks

A scalable attribute-aware network embedding system

Multi-level network embedding with boosted low-rank matrix approximation

Contact Info

Product

Resources

About