Abstract:The paper presents a corpus-driven method for the detection of recent grammatical change in contemporary Czech newspapers. It is based on a large and homogeneous material (825 million tokens of a single newspaper) that covers a 23-year time span. The task is operationalised into finding the most relevant frequency change manifested by selected subsets of the Czech tagset. The results show changing proportions of parts of speech, nominal cases etc. that indicate a shift towards more "verbal" language associated with increasing informality of the newspaper register.Keywords: modern diachrony, language change, Czech, newspaper register, corpus composition
INtrODUctIONThe paper aims to investigate recent grammatical change that can be observed in contemporary Czech newspapers. It presents an automatic corpus-driven method used to detect morphological features that show the most considerable diachronic shift. finally, the results as well as the limitations of such an approach are discussed. • it is based on large and homogeneous data; • morphological categories (rather than the often studied individual word forms) are investigated systematically and in a corpus-driven manner; this has been operationalised into finding the most relevant frequency changes manifested by selected subsets of the currently used Czech tagset; • evaluation of the frequency differences is carried out using Mann-Kendall test and Theil-Sen estimator.
DAtAIt is often emphasised that research aiming to discover recent language change should be based on large and homogeneous data covered by many data points [9], [14]. This has determined selection of SyN v4 as the base corpus [11]. With its 4.3