This paper investigates the contribution of author/idiolect vs. register/type-of-text – as the most salient factors influencing the final shape of a text – towards explaining the variation observed in Czech texts. Since it is almost impossible to explore the effect of these factors on authentic data, we used elicited letters collected in a fully crossed experimental design (representative sample of 200 authors × four elicitation scenarios serving as a proxy to register variation). The variation encompassed by the elicited texts is analyzed through the lens of a general-purpose multi-dimensional model of Czech. Using triangulation via three established statistical methods and one devised for the purpose of this study, we find that register matters a great deal, explaining 1.5 times as much variation overall as idiolect. This should be taken into account when designing research in sociolinguistics or variation studies in general.
The paper introduces a new section separated from journalistic texts in Czech corpora, namely interviews. This genre is highly specific; from among the texts that can be found in newspapers and magazines, it is probably the closest to spoken language. In two case studies, we present the possible application of the interviews subcorpus in linguistic research. The first one deals with the role of paralinguistic behaviour, especially laughter in written interviews vs. spoken dialogues. The second one investigates the specifics of the demonstrative ten in the function of a nominal attribute, again in both written and spoken data.
ORATOR v2 is a new 1.5M word corpus of Czech monologues, delivered to a live audience in semi-formal to formal settings. It was designed to chart the space of naturally occurring monologues which can be obtained for corpus processing. As such, it aims for diversity but does not attempt any balancing of subcategories, recognizing that some types of data are inherently easier to obtain in high volume than others. The transcription guidelines and annotation tools employed are the same as other recent spoken corpora published by the CNC, which facilitates interesting comparisons between various types of spoken Czech. The present paper sketches out three case studies, comparing ORATOR to the informal conversations of ORTOFON v2 in terms of the frequencies of demonstratives and hesitations, as well as lexical richness.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.