D. S. Korshunov scite author profile

D. S. Korshunov

3Publications

0Citation Statements Received

0Citation Statements Given

How they've been cited

How they cite others

Affiliations

Military University

Publications

Order By: Most citations

Frequency of co-occurrence of chinese characters as an indicator of lexicality (when selecting the vocabulary of chinese military discourse)

Korshunov

2020

Philology at MGIMO

View full text Add to dashboard Cite

Teaching a foreign language in a non-linguistic college or university should be professionally oriented, which brings up the question of selecting the relevant vocabulary of a professional discourse under study. Modern text corpora are too general in subject matter and the time span. Therefore, a specially compiled collection of texts can serve the purpose of selecting the vocabulary. In the case of the Chinese language, the task is complicated by the lack of word segmentation in such texts. Taking into account the fact that most words in Chinese are written in two characters, it is assumed that one of the methods applicable in this situation is a comprehensive frequency analysis of text sequences of two characters – character bigrams. The analysis of frequent bigrams has showed that 70% of the most frequent lexical units are representative of the discourse, including 11% of out-of-vocabulary ones. The remaining part of bigrams pertain to syntactic constructions, including structurally incomplete ones, and fragments of longer lexical units. Thus, the high frequency of character co-occurrence can with a rather high probability (p > 0.7) be considered as an indicator of lexicality in identifying representative vocabulary in an unsegmented the matic collection of texts in Chinese.

show abstract

Distinctive Features of Association Measures Applied to Chinese Character Bigram Extraction Tasks

Korshunov

2022

jour

View full text Add to dashboard Cite

Studying professional discourse, a researcher has now an opportunity to create collections of texts and apply linguistic analysis software tools to them. However, when it comes to Chinese discourse there is a problem with the reliability of automatic word segmentation of texts. One of the ways to extract lexical units in Chinese texts is to apply statistical association measures for collocations to Chinese character bigrams. The purpose of this work is to conduct a comparative analysis of seven different statistical measures for collocations as a means of extracting two-syllabic lexical units (binomes) in an unsegmented Chinese character text. The subject of the analysis is the lexical, grammatical and frequency characteristics of bigrams with higher values of the statistical measures. Their comparison makes it possible to draw a conclusion about the features of statistical measures, in particular, about the best correspondence of linguistic tasks to statistical measures. The linguistic material of the study was a collection of 560 military-related news texts in Chinese with more than 720 thousand characters. The results show that the statistical measures considered can be divided into three groups according to the characteristics of bigrams receiving the highest values. The first group includes measures MI, MS and logDice, which give priority to rare bigrams with limited compatibility of components, such as the Chinese two-syllable single morpheme words “lianmianzi”. These measures do not extract terms well, but can be used to search for phraseologically related components. The measures of the second group, t-score and log-likelihood, are frequency-oriented, similar to frequency analysis, but they cope with non-lexical bigrams better, while log-likelihood somewhat lowers the rank of numerals and pronouns, picking out best the typical vocabulary of professional discourse. The third group includes measures MI3 and MI.log-f, which average the opposite approaches of the first two groups. The MI3 measure is considered to be the most universal one; it could be used to compare different corpora or collections of texts. It is concluded that applying statistical association measures to Chinese character bi-grams is possible and appropriate, when taking into account the correspondence of their specifics to a research task.

show abstract

Frequency of Suffixal Morphemes as a Characteristic of Chinese News Military Discourse

Korshunov

2022

Вестник МГЛУ. Гуманитарные науки

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

D. S. Korshunov

Frequency of co-occurrence of chinese characters as an indicator of lexicality (when selecting the vocabulary of chinese military discourse)

Distinctive Features of Association Measures Applied to Chinese Character Bigram Extraction Tasks

Frequency of Suffixal Morphemes as a Characteristic of Chinese News Military Discourse

Contact Info

Product

Resources

About