2019
DOI: 10.1515/cllt-2016-0062
|View full text |Cite
|
Sign up to set email alerts
|

A study on Chinese register characteristics based on regression analysis and text clustering

Abstract: This paper reports an innovative Chinese register study based on regression analysis for sentence length distribution and text clustering. Although end of sentence is not conventionally marked in Chinese, we resolve this issue by assuming that segments between periods, question marks, and exclamation marks are sentences, which can be further divided into simple sentences and compound sentences. We also assume that segments between punctuation marks that express pauses in utterances form sentences (i.e., clause… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
1

Relationship

3
2

Authors

Journals

citations
Cited by 9 publications
(9 citation statements)
references
References 13 publications
0
9
0
Order By: Relevance
“…This result suggests that robust identification technology of similar and related languages must also take into consideration other dimensions of textual variations such as gender, genre, or register. In fact, in a series of study we did take register into consideration (Hou et al 2017, Hou, Huang andLiu 2019;. In addition to show that fitted power function model between linguistic units and their constituents is an effective tool for classification of similar languages, we also showed that single feature works better with binary classification.…”
Section: Resultsmentioning
confidence: 84%
See 3 more Smart Citations
“…This result suggests that robust identification technology of similar and related languages must also take into consideration other dimensions of textual variations such as gender, genre, or register. In fact, in a series of study we did take register into consideration (Hou et al 2017, Hou, Huang andLiu 2019;. In addition to show that fitted power function model between linguistic units and their constituents is an effective tool for classification of similar languages, we also showed that single feature works better with binary classification.…”
Section: Resultsmentioning
confidence: 84%
“…Altmann (2005, 2007) demonstrated an effective way to model discrete phenomenon using continuous models and vice versa. Hou, Huang, and Liu (2019) showed that the sentence/clause length in Chinese texts can also be fitted by Formula (1) in variations of the Chinese language based on data from Mainland China.…”
Section: Language As Complex Self-adaptive Systemmentioning
confidence: 99%
See 2 more Smart Citations
“…According to the approach of many Chinese treebanks (e.g., Chen et al 1996 for Sinica TreeBank, Huang and Chen 2017) and the analysis of sentence length distribution in quantitative linguistics (Hou, Huang, and Liu 2017), all segments between commas, semicolons, colons, periods, exclamation marks, and question marks that express pauses in utterances are marked as sentences. Actually, the sentences that are identified by this definition are clauses (Hou et al 2017) and conform to the definitions that rely on pauses and intonation changes in the utterances.…”
Section: Resultsmentioning
confidence: 99%