Our system is currently under heavy load due to increased usage. We're actively working on upgrades to improve performance. Thank you for your patience.
2019
DOI: 10.1109/access.2019.2908014
|View full text |Cite
|
Sign up to set email alerts
|

Learning Chinese Word Embeddings With Words and Subcharacter N-Grams

Abstract: Co-occurrence information between words is the basis of training word embeddings; besides, Chinese characters are composed of subcharacters, words made up by the same characters or subcharacters usually have similar semantics, but this internal substructure information is usually neglected in popular models. In this paper, we propose a novel method for learning Chinese word embeddings, which takes full use of external co-occurrence context information and internal substructure information. We represent each wo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(6 citation statements)
references
References 12 publications
0
6
0
Order By: Relevance
“…Stroke and adjacent stroke vector representation as subcharacter embedding combined the character embedding was constructed to achieve continuous enhancement of word embedding [42]. Cj2vec replaced the strokes with Cangjie-codes and trained the model in the same way [26] with cw2vec. Since Chinese characters have both semantic and phonetic information, multiple character embedding models, including Pinyin (Fig.…”
Section: Semantic and Morphological Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Stroke and adjacent stroke vector representation as subcharacter embedding combined the character embedding was constructed to achieve continuous enhancement of word embedding [42]. Cj2vec replaced the strokes with Cangjie-codes and trained the model in the same way [26] with cw2vec. Since Chinese characters have both semantic and phonetic information, multiple character embedding models, including Pinyin (Fig.…”
Section: Semantic and Morphological Methodsmentioning
confidence: 99%
“…Meanwhile, Chinese is a hieroglyphic language that retains morphological information and intuitive semantic elements, and we can even speculate on the meaning of Chinese characters by glyph conjecture. This is a lead cause of many excellent works that jointly utilize the semantic and morphological subwords to improve word embedding [17], [20], [25], [26]. Some of the abovementioned methods capture morphological information by stroke n-gram bags or components, but we argue that these methods obtain few morphological features.…”
Section: Introductionmentioning
confidence: 97%
“…Due to its success in modelling English documents, word embedding has been applied to Chinese text. Benefiting from the internal structural information of Chinese characters, many studies tried to enhance the quality of Chinese word embeddings with radicals [30][31][32], subword components [33,34], glyph features [35], strokes [36], and pronunciation [37]. To limit the scope of this paper, we choose Skip-gram because, after comparing the word embedding model established by the two corpora used in this experiment, we found Skip-gram to have the best performance on average.…”
Section: The Model Architectures For Word Embeddingmentioning
confidence: 99%
“…Firstly, the large-scale general industry corpus documents are subjected to data preprocessing, and the StructBert pre-training model of the general industry is trained and generated. Then the data is fine-tuned to the StructBert model for the power dispatching field through the power dispatching field corpus after data preprocessing [4].…”
Section: Construction Of Power Dispatch Knowledge Base System (1) Ber...mentioning
confidence: 99%