Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Confere 2015
DOI: 10.3115/v1/p15-2098
|View full text |Cite
|
Sign up to set email alerts
|

Radical Embedding: Delving Deeper to Chinese Radicals

Abstract: Languages using Chinese characters are mostly processed at word level. Inspired by recent success of deep learning, we delve deeper to character and radical levels for Chinese language processing. We propose a new deep learning technique, called "radical embedding", with justifications based on Chinese linguistics, and validate its feasibility and utility through a set of three experiments: two in-house standard experiments on short-text categorization (STC) and Chinese word segmentation (CWS), and one in-fiel… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
55
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 65 publications
(59 citation statements)
references
References 7 publications
(8 reference statements)
0
55
0
Order By: Relevance
“…In alphabetic languages, sub-word units are easy to identify, whereas in logographic languages, a similar effect can be achieved only if sub-character level information is taken into consideration. 1 Having noticed this significant difference between these two writing systems, Shi et al (2015), Liu et al (2017), Peng et al (2017), and Cao et al (2017) used stroke-level information for logographic languages when constructing word embeddings; Toyama et al (2017) used visual information for strokes and Japanese Kanji radicals in a text classification task. 2 Some studies have performed NMT tasks using various sub-word "equivalents".…”
Section: Introductionmentioning
confidence: 99%
“…In alphabetic languages, sub-word units are easy to identify, whereas in logographic languages, a similar effect can be achieved only if sub-character level information is taken into consideration. 1 Having noticed this significant difference between these two writing systems, Shi et al (2015), Liu et al (2017), Peng et al (2017), and Cao et al (2017) used stroke-level information for logographic languages when constructing word embeddings; Toyama et al (2017) used visual information for strokes and Japanese Kanji radicals in a text classification task. 2 Some studies have performed NMT tasks using various sub-word "equivalents".…”
Section: Introductionmentioning
confidence: 99%
“…After that, Chen et al (2015) argued that the semantic meaning of a word was also related to the meanings of its composing characters, and the word embeddings could be enhanced with the help of the context characters. After that, Shi et al (2015) made a tentative exploration about radicals, and demonstrated the utility of radicals in some conditions. Furthermore, some methods were proposed to use radical information to strengthen Chinese word embedding (Yin et al 2016;Yu et al 2017), but the scope of their research on radicals was still limited to the embedding problem.…”
Section: Related Workmentioning
confidence: 99%
“…Based on the linguistic features of Chinese, recent methods have used the character information to improve Chinese word embeddings. These methods can be categorized into two kinds: 1) One kind of methods learn word embeddings with its constituent character (Chen et al, 2015), radical 2 (Shi et al, 2015;Yin et al, 2016; or strokes 3 (Cao et al, 2018). However, these methods usually use simple operations, such as averaging and n-gram, to model the inherent compositionality within a word, which is not enough to handle the complicated linguistic compositionality.…”
Section: Introductionmentioning
confidence: 99%