Learning Chinese Word Embeddings With Words and Subcharacter N-Grams

Kang, Ruizhi; Zhang, Hongjun; Hao, Wenning; Cheng, Kai; Zhang, Guanglu

doi:10.1109/access.2019.2908014

Cited by 10 publications

(6 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Stroke and adjacent stroke vector representation as subcharacter embedding combined the character embedding was constructed to achieve continuous enhancement of word embedding [42]. Cj2vec replaced the strokes with Cangjie-codes and trained the model in the same way [26] with cw2vec. Since Chinese characters have both semantic and phonetic information, multiple character embedding models, including Pinyin (Fig.…”

Section: Semantic and Morphological Methodsmentioning

confidence: 99%

“…Meanwhile, Chinese is a hieroglyphic language that retains morphological information and intuitive semantic elements, and we can even speculate on the meaning of Chinese characters by glyph conjecture. This is a lead cause of many excellent works that jointly utilize the semantic and morphological subwords to improve word embedding [17], [20], [25], [26]. Some of the abovementioned methods capture morphological information by stroke n-gram bags or components, but we argue that these methods obtain few morphological features.…”

Section: Introductionmentioning

confidence: 97%

See 1 more Smart Citation

Improving Chinese Word Representation Using Four Corners Features

Jin¹,

Zhang²,

Yuan³

2022

IEEE Trans. Big Data

View full text Add to dashboard Cite

Section: Semantic and Morphological Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 97%

Improving Chinese Word Representation Using Four Corners Features

Jin¹,

Zhang²,

Yuan³

2022

IEEE Trans. Big Data

View full text Add to dashboard Cite

“…Due to its success in modelling English documents, word embedding has been applied to Chinese text. Benefiting from the internal structural information of Chinese characters, many studies tried to enhance the quality of Chinese word embeddings with radicals [30][31][32], subword components [33,34], glyph features [35], strokes [36], and pronunciation [37]. To limit the scope of this paper, we choose Skip-gram because, after comparing the word embedding model established by the two corpora used in this experiment, we found Skip-gram to have the best performance on average.…”

Section: The Model Architectures For Word Embeddingmentioning

confidence: 99%

An Evaluation Dataset for Legal Word Embedding: A Case Study on Chinese Codex

Lin¹,

Cheng²

2022

Embedded Systems and Applications

View full text Add to dashboard Cite

Word embedding is a modern distributed word representations approach and widely used in many natural language processing tasks. Converting the vocabulary in a legal document into a word embedding model facilitates subjecting legal documents to machine learning, deep learning, and other algorithms and subsequently performing the downstream tasks of natural language processing vis-à-vis, for instance, document classification, contract review, and machine translation. The most common and practical approach of accuracy evaluation with the word embedding model uses a benchmark set with linguistic rules or the relationship between words to perform analogy reasoning via algebraic calculation. This paper proposes establishing an 1,134 Legal Analogical Reasoning Questions Set (LARQS) from the 2,388 Chinese Codex corpus using five kinds of legal relations, which are then used to evaluate the accuracy of the Chinese word embedding model. Moreover, we discovered that legal relations might be ubiquitous in the word embedding model.

show abstract

“…Firstly, the large-scale general industry corpus documents are subjected to data preprocessing, and the StructBert pre-training model of the general industry is trained and generated. Then the data is fine-tuned to the StructBert model for the power dispatching field through the power dispatching field corpus after data preprocessing [4].…”

Section: Construction Of Power Dispatch Knowledge Base System (1) Ber...mentioning

confidence: 99%

Power Dispatching Voice Backtracking System Based on Knowledge Base

Meng

Chen

et al. 2022

J. Phys.: Conf. Ser.

View full text Add to dashboard Cite

With the rapid development of the scale and structure of the power grid, the power dispatching center has built a complete dispatching exchange communication network, which can meet the daily dispatching communication needs, but various problems are also emerging in the dispatching process. Aiming at the problem of high integration of physical power grid and information system, this paper proposes a voice backtracking system for power dispatching based on knowledge base, which includes knowledge base establishment, knowledge storage, knowledge question answering sorting and knowledge base front-end question answering. Firstly, the basic model of power dispatching knowledge processing is trained, and then the knowledge is extracted from the domain corpus, and finally, the knowledge question answering is realized by using the ranking model. Through the experimental comparison, it is proved that the accuracy of the method in this paper is higher than that of the method in the postprocessing by adding the MRC model in the pre-processing of knowledge extraction, and it is better than other methods in the comparative test. It provides technical support for power system voice dispatching management and provides a reference for other areas of voice management.

show abstract

Learning Chinese Word Embeddings With Words and Subcharacter N-Grams

Cited by 10 publications

References 12 publications

Improving Chinese Word Representation Using Four Corners Features

Improving Chinese Word Representation Using Four Corners Features

An Evaluation Dataset for Legal Word Embedding: A Case Study on Chinese Codex

Power Dispatching Voice Backtracking System Based on Knowledge Base

Contact Info

Product

Resources

About