Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing 2016
DOI: 10.18653/v1/d16-1100
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Granularity Chinese Word Embedding

Abstract: This paper considers the problem of learning Chinese word embeddings. In contrast to English, a Chinese word is usually composed of characters, and most of the characters themselves can be further divided into components such as radicals. While characters and radicals contain rich information and are capable of indicating semantic meanings of words, they have not been fully exploited by existing word embedding methods. In this work, we propose multi-granularity embedding (MGE) for Chinese words. The key idea i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
76
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 90 publications
(82 citation statements)
references
References 13 publications
0
76
0
Order By: Relevance
“…Multi Granularity Representation Multigranularity representation, which is proposed to make full use of subunit composition at different levels of granularity, has been explored in various NLP tasks, such as paraphrase identification (Yin and Schütze, 2015), Chinese word embedding learning (Yin et al, 2016), universal sentence encoding and machine translation (Nguyen and Joty, 2018;Li et al, 2019b). The major difference between our work and Nguyen and Joty (2018); Li et al (2019b) lies in that we successfully introduce syntactic information into our multi-granularity representation.…”
Section: Related Workmentioning
confidence: 99%
“…Multi Granularity Representation Multigranularity representation, which is proposed to make full use of subunit composition at different levels of granularity, has been explored in various NLP tasks, such as paraphrase identification (Yin and Schütze, 2015), Chinese word embedding learning (Yin et al, 2016), universal sentence encoding and machine translation (Nguyen and Joty, 2018;Li et al, 2019b). The major difference between our work and Nguyen and Joty (2018); Li et al (2019b) lies in that we successfully introduce syntactic information into our multi-granularity representation.…”
Section: Related Workmentioning
confidence: 99%
“…Based on CBOW and CWE, (Yin et al, 2016) proposed MGE, which predicts target word with its radical embeddings and modified word embeddings of context in CWE, as shown in Fig.5 (b).…”
Section: Multi-granularity Embedding (Mge)mentioning
confidence: 99%
“…The glyph features improved CWE on WordSim-240 and SimLex-999, but not WordSim-296. As for MGE results, we were not able to reproduce the performance in (Yin et al, 2016). We list possible reasons as below: we did not separate non-compositional word during training (character and radical embeddings are not used for these words), and the we created character-radical index from different data source.…”
Section: Word Similaritymentioning
confidence: 99%
“…We use THULAC 3 (Sun et al, 2016b) for Chinese word segmentation and POS tagging. We identify all entity names for CWE (Chen et al, 2015) and MGE (Yin et al, 2016) as they do not use the characters information for non-compositional words. Our model (JWE) does not use such a non-compositional word list.…”
Section: Experimental Settings Training Corpusmentioning
confidence: 99%