Incorporating Chinese Characters of Words for Lexical Sememe Prediction

Jin, Huiming; Zhu, Hao; Liu, Zhiyuan; Xie, Rong; Sun, Maosong; Lin, Feng; Lin, Leyu

doi:10.18653/v1/p18-1227

Cited by 26 publications

(34 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Sememe prediction is a well-defined task Jin et al, 2018;Qi et al, 2018), aimed at selecting appropriate sememes for unannotated words or phrases from the set of all the sememes. Existing works model sememe prediction as a multi-label classification problem, where sememes are regarded as the labels of words and phrases.…”

Section: Training For Mwe Sememe Predictionmentioning

confidence: 99%

See 1 more Smart Citation

Modeling Semantic Compositionality with Sememe Knowledge

Qi¹,

Huang²,

Yang³

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

Semantic compositionality (SC) refers to the phenomenon that the meaning of a complex linguistic unit can be composed of the meanings of its constituents. Most related works focus on using complicated compositionality functions to model SC while few works consider external knowledge in models. In this paper, we verify the effectiveness of sememes, the minimum semantic units of human languages, in modeling SC by a confirmatory experiment. Furthermore, we make the first attempt to incorporate sememe knowledge into SC models, and employ the sememeincorporated models in learning representations of multiword expressions, a typical task of SC. In experiments, we implement our models by incorporating knowledge from a famous sememe knowledge base HowNet and perform both intrinsic and extrinsic evaluations. Experimental results show that our models achieve significant performance boost as compared to the baseline methods without considering sememe knowledge. We further conduct quantitative analysis and case studies to demonstrate the effectiveness of applying sememe knowledge in modeling SC. All the code and data of this paper can be obtained on https: //github.com/thunlp/Sememe-SC.

show abstract

Section: Training For Mwe Sememe Predictionmentioning

confidence: 99%

“…In HowNet, there are 118,346 Chinese words annotated with 2,138 sememes in total. Following previous work Jin et al, 2018), we filter out the low-frequency sememes, which are considered unimportant. The final number of sememes we use is 1,335.…”

Section: Datasetmentioning

confidence: 99%

Modeling Semantic Compositionality with Sememe Knowledge

Qi¹,

Huang²,

Yang³

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

show abstract

“…Previous methods of sememe prediction for words usually compute an association score for each sememe and select the sememes with scores higher than a threshold to form the predicted sememe set (Xie et al 2017;Jin et al 2018). Following this formulation, we havê…”

Section: Spbs Task Formalizationmentioning

confidence: 99%

Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets

Chang

Sun

et al. 2020

AAAI

Self Cite

View full text Add to dashboard Cite

A sememe is defined as the minimum semantic unit of human languages. Sememe knowledge bases (KBs), which contain words annotated with sememes, have been successfully applied to many NLP tasks. However, existing sememe KBs are built on only a few languages, which hinders their widespread utilization. To address the issue, we propose to build a unified sememe KB for multiple languages based on BabelNet, a multilingual encyclopedic dictionary. We first build a dataset serving as the seed of the multilingual sememe KB. It manually annotates sememes for over 15 thousand synsets (the entries of BabelNet). Then, we present a novel task of automatic sememe prediction for synsets, aiming to expand the seed dataset into a usable KB. We also propose two simple and effective models, which exploit different information of synsets. Finally, we conduct quantitative and qualitative analyses to explore important factors and difficulties in the task. All the source code and data of this work can be obtained on https://github.com/thunlp/BabelNet-Sememe-Prediction.

show abstract

“…apply matrix factorization to predict sememes for words. Jin et al (2018) improve their work by incorporating character-level information. Our work extends the previous works and tries to combine word-sense-sememe hierar-chy with the sequential model.…”

Section: Sememementioning

confidence: 99%

Language Modeling with Sparse Product of Sememe Experts

Yan

Zhu

et al. 2018

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Self Cite

View full text Add to dashboard Cite

Most language modeling methods rely on large-scale data to statistically learn the sequential patterns of words.In this paper, we argue that words are atomic language units but not necessarily atomic semantic units. Inspired by HowNet, we use sememes, the minimum semantic units in human languages, to represent the implicit semantics behind words for language modeling, named Sememe-Driven Language Model (SDLM). More specifically, to predict the next word, SDLM first estimates the sememe distribution given textual context. Afterwards, it regards each sememe as a distinct semantic expert, and these experts jointly identify the most probable senses and the corresponding word. In this way, SDLM enables language models to work beyond word-level manipulation to fine-grained sememe-level semantics, and offers us more powerful tools to fine-tune language models and improve the interpretability as well as the robustness of language models. Experiments on language modeling and the downstream application of headline generation demonstrate the significant effectiveness of SDLM. Source code and data used in the experiments can be accessed at https:// github.com/thunlp/SDLM-pytorch.

show abstract

Incorporating Chinese Characters of Words for Lexical Sememe Prediction

Cited by 26 publications

References 34 publications

Modeling Semantic Compositionality with Sememe Knowledge

Modeling Semantic Compositionality with Sememe Knowledge

Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets

Language Modeling with Sparse Product of Sememe Experts

Contact Info

Product

Resources

About