2020
DOI: 10.1109/taslp.2019.2955246
|View full text |Cite
|
Sign up to set email alerts
|

Hierarchical Character Embeddings: Learning Phonological and Semantic Representations in Languages of Logographic Origin Using Recursive Neural Networks

Abstract: Logographs (Chinese characters) have recursive structures (i.e. hierarchies of sub-units in logographs) that contain phonological and semantic information, as developmental psychology literature suggests that native speakers leverage on the structures to learn how to read. Exploiting these structures could potentially lead to better embeddings that can benefit many downstream tasks. We propose building hierarchical logograph (character) embeddings from logograph recursive structures using treeLSTM, a recursive… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 17 publications
(9 citation statements)
references
References 49 publications
(70 reference statements)
0
7
0
Order By: Relevance
“…where u i denotes the i-th word in the target language sequence and c i denotes the background vector of word i. Since the background vectors of the LSTM model with the embedded attention mechanism are a set of multiple vectors, rather than being uniformly fixed [19], each word in the target language sequence can find a unique background vector corresponding to it.…”
Section: Models In Ismentioning
confidence: 99%
“…where u i denotes the i-th word in the target language sequence and c i denotes the background vector of word i. Since the background vectors of the LSTM model with the embedded attention mechanism are a set of multiple vectors, rather than being uniformly fixed [19], each word in the target language sequence can find a unique background vector corresponding to it.…”
Section: Models In Ismentioning
confidence: 99%
“…First, we use the file 1 about structures of Han Ideographs and refer to Ke and Hagiwara 2 to get all the Chinese character trees. Then, we use the depth-first algorithm to convert each character tree into the format of a sequence (Nguyen et al, 2019). Note that, there are two types of tokens in the input sequence.…”
Section: Pre-training Model Architecturementioning
confidence: 99%
“…Such an encoding has been applied for semantic or sentiment classification [26][27][28], named-entity recognition [29], and has also been used in neural machine translation [32,33]. Moreover, encoding of subwordunit-based tree structures using tree-RNN [30,31] has also been studied for better word representations.…”
Section: Tree-rnnmentioning
confidence: 99%