Proceedings of the 12th International Conference on Natural Language Generation 2019
DOI: 10.18653/v1/w19-8605
|View full text |Cite
|
Sign up to set email alerts
|

Choosing between Long and Short Word Forms in Mandarin

Abstract: Between 80% and 90% of all Chinese words have long and short form such as 老虎/虎 (laohu/hu , tiger) (Duanmu, 2013). Consequently, the choice between long and short forms is a key problem for lexical choice across NLP and NLG in Chinese. Following on from earlier work on abbreviations in English (Mahowald et al., 2013), we bring a probabilistic perspective to word length choice, using both a behavioural and a corpus-based approach. Thus, we hypothesise that, in Chinese, short forms are likelier in supportive than… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 17 publications
0
2
0
Order By: Relevance
“…Previous work (Guo, 1938;Duanmu, 2013;Duanmu and Dong, 2016;Huang and Duanmu, 2013) show that as high as 90% Chinese word has long and short forms, which is a key issue in Chinese lexical choice. Li et al(2019) investigated the problem of long and short form choice through human and corpus-based approaches, whose results support the statistical significant correlation between word length and the predictability of its context. Most previous work investigate the distribution and preference of long and short form based on corpus.…”
Section: Related Workmentioning
confidence: 75%
See 1 more Smart Citation
“…Previous work (Guo, 1938;Duanmu, 2013;Duanmu and Dong, 2016;Huang and Duanmu, 2013) show that as high as 90% Chinese word has long and short forms, which is a key issue in Chinese lexical choice. Li et al(2019) investigated the problem of long and short form choice through human and corpus-based approaches, whose results support the statistical significant correlation between word length and the predictability of its context. Most previous work investigate the distribution and preference of long and short form based on corpus.…”
Section: Related Workmentioning
confidence: 75%
“…According to semantic relation of the two morphemes of long forms, the long and short forms can be categorized into 7 groups (Li et al, 2019). The X-XX category refers to reduplicated long and short forms such as -(mama-ma, mother) or -(jinjin-jin, only).…”
Section: Post-hoc Analysismentioning
confidence: 99%