Cross-lingual Lexical Sememe Prediction

Qi, Fanchao; Lin, Yankai; Sun, Maosong; Zhu, Hao; Xie, Rong; Liu, Zhiyuan

doi:10.18653/v1/d18-1033

Cited by 22 publications

(16 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Sememe prediction is a well-defined task Jin et al, 2018;Qi et al, 2018), aimed at selecting appropriate sememes for unannotated words or phrases from the set of all the sememes. Existing works model sememe prediction as a multi-label classification problem, where sememes are regarded as the labels of words and phrases.…”

Section: Training For Mwe Sememe Predictionmentioning

confidence: 99%

“…We use the above-mentioned test set for evaluation. As for the evaluation protocol, we adopt mean average precision (MAP) and F1 score following previous sememe prediction works Qi et al, 2018). Since our SC models and baseline methods yield a score for each se-meme in the whole sememe set, we pick the sememes with scores higher than δ to compute F1 score, where δ is a hyper-parameter and also tuned to the best on the validation set.…”

Section: Evaluation Dataset and Protocolmentioning

confidence: 99%

See 1 more Smart Citation

Modeling Semantic Compositionality with Sememe Knowledge

Qi¹,

Huang²,

Yang³

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

Semantic compositionality (SC) refers to the phenomenon that the meaning of a complex linguistic unit can be composed of the meanings of its constituents. Most related works focus on using complicated compositionality functions to model SC while few works consider external knowledge in models. In this paper, we verify the effectiveness of sememes, the minimum semantic units of human languages, in modeling SC by a confirmatory experiment. Furthermore, we make the first attempt to incorporate sememe knowledge into SC models, and employ the sememeincorporated models in learning representations of multiword expressions, a typical task of SC. In experiments, we implement our models by incorporating knowledge from a famous sememe knowledge base HowNet and perform both intrinsic and extrinsic evaluations. Experimental results show that our models achieve significant performance boost as compared to the baseline methods without considering sememe knowledge. We further conduct quantitative analysis and case studies to demonstrate the effectiveness of applying sememe knowledge in modeling SC. All the code and data of this paper can be obtained on https: //github.com/thunlp/Sememe-SC.

show abstract

Section: Training For Mwe Sememe Predictionmentioning

confidence: 99%

Section: Evaluation Dataset and Protocolmentioning

confidence: 99%

Modeling Semantic Compositionality with Sememe Knowledge

Qi¹,

Huang²,

Yang³

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

show abstract

“…Some work tries to expand HowNet by predicting sememes for new words (Xie et al 2017;Jin et al 2018). To the best of our knowledge, only Qi et al (2018) make an attempt to build a sememe KB for another language by cross-lingual lexical sememe prediction (CLSP). They learn bilingual word embeddings in a unified semantic space, and then predict sememes for target words according to their meaning-similar words in the sememeannotated language.…”

Section: Related Workmentioning

confidence: 99%

“…However, building a sememe KB for a new language from scratch is time-consuming and laborintensive -the construction of HowNet takes several linguistic experts more than two decades. To tackle this challenge, Qi et al (2018) present the task of cross-lingual lexical sememe prediction (CLSP), aiming to facilitate the construction of a new language's sememe KB by predicting sememes for words in that language. However, CLSP can pre-A BabelNet synset sememes annotate bn:00045106n en: husband, hubby zh: , , , fr: mari, époux, marié de: Ehemann, Gemahl, Gatte …… human family male spouse Figure 2: Annotating sememes for the BabelNet synset whose ID is bn:00045106n.…”

Section: Introductionmentioning

confidence: 99%

Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets

Chang

Sun

et al. 2020

AAAI

Self Cite

View full text Add to dashboard Cite

A sememe is defined as the minimum semantic unit of human languages. Sememe knowledge bases (KBs), which contain words annotated with sememes, have been successfully applied to many NLP tasks. However, existing sememe KBs are built on only a few languages, which hinders their widespread utilization. To address the issue, we propose to build a unified sememe KB for multiple languages based on BabelNet, a multilingual encyclopedic dictionary. We first build a dataset serving as the seed of the multilingual sememe KB. It manually annotates sememes for over 15 thousand synsets (the entries of BabelNet). Then, we present a novel task of automatic sememe prediction for synsets, aiming to expand the seed dataset into a usable KB. We also propose two simple and effective models, which exploit different information of synsets. Finally, we conduct quantitative and qualitative analyses to explore important factors and difficulties in the task. All the source code and data of this work can be obtained on https://github.com/thunlp/BabelNet-Sememe-Prediction.

show abstract

“…The method can alleviate the problem of large errors in the word vectors for words with fewer frequencies in the corpus. Based on the complementarity of different languages, Qi, F., et al [24] establishes the association between semantics and cross-lingual words in the low-dimensional semantic space, and thus improves the ability of semantics prediction. Although the above work is very innovative, the employed knowledge is not very closed with sememes, and there is still a gap between the predicted results and the sememes that should be assigned.…”

Section: Related Workmentioning

confidence: 99%

Incorporating Synonym for Lexical Sememe Prediction: An Attention-Based Model

Kang

Yao

et al. 2020

Applied Sciences

View full text Add to dashboard Cite

Sememe is the smallest semantic unit for describing real-world concepts, which improves the interpretability and performance of Natural Language Processing (NLP). To maintain the accuracy of the sememe description, its knowledge base needs to be continuously updated, which is time-consuming and labor-intensive. Sememes predictions can assign sememes to unlabeled words and are valuable work for automatically building and/or updating sememeknowledge bases (KBs). Existing methods are overdependent on the quality of the word embedding vectors, it remains a challenge for accurate sememe prediction. To address this problem, this study proposes a novel model to improve the performance of sememe prediction by introducing synonyms. The model scores candidate sememes from synonyms by combining distances of words in embedding vector space and derives an attention-based strategy to dynamically balance two kinds of knowledge from synonymous word set and word embedding vector. A series of experiments are performed, and the results show that the proposed model has made a significant improvement in the sememe prediction accuracy. The model provides a methodological reference for commonsense KB updating and embedding of commonsense knowledge.

show abstract

Cross-lingual Lexical Sememe Prediction

Cited by 22 publications

References 34 publications

Modeling Semantic Compositionality with Sememe Knowledge

Modeling Semantic Compositionality with Sememe Knowledge

Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets

Incorporating Synonym for Lexical Sememe Prediction: An Attention-Based Model

Contact Info

Product

Resources

About