This paper is part of a larger research effort on language variability aimed at uncovering the relations between extra- and intratextual characteristics of Czech texts by means of multi-dimensional analysis. The palpable lack of prior art on quantitative register analysis of Czech led to several distinctive methodological decisions, concerning namely corpus design, feature selection and the parameters of factor analysis, especially the number of dimensions to extract. We report on these for their potential relevance to other researchers embarking on a similar journey. In order to demonstrate the viability of the model, we also present a brief interpretation of the resulting dimensions.
This study uses corpus-linguistic methods to examine the relationship between language usage patterns and divergence in text interpretation. Our target of analysis is a set of texts (Czechoslovak presidential New Year’s addresses from 1975 to 1989), which contemporary readers consider repetitious and devoid of content. These texts were statistically contrasted with corpora from two different periods: one from the totalitarian period and the other from the contemporary (post-totalitarian) period. The comparison was based on the Difference Index, the most recent effect-size estimator, which was used to enhance the interpretation of keyword analysis outcomes. The two analyses yield significantly different results: the data from the analysis using the contemporary corpus were commensurate with contemporary readers’ impressions; those from the analysis using the totalitarian corpus fluctuated in tandem with (and sometimes in anticipation of) political and social changes during the 15-year period and suggested an interpretation of the texts by a reader more familiar with totalitarian texts.
This paper describes how corpus-assisted discourse analysis based on keyword identification and interpretation can benefit from employing Market Basket Analysis (mba) after keyword extraction. mba is a data mining technique used originally in marketing that can reveal consistent associations between items in a shopping cart, but also between keywords in a corpus of many texts. By identifying recurring associations between keywords, we can compensate for the lack of wider context which is a major issue impeding the interpretation of isolated keywords (especially when analysing large data). To showcase the advantages of mba in ‘re-contextualising’ keywords within the discourse, we conducted a pilot study on the topic of migration, contrasting anti-system and centre-right Czech Internet media. The results show that mba is useful in identifying the dominant strategy of anti-system news portals: to weave in a confounding ideological undercurrent and connect the concept of migrants to a multitude of other topics (i.e., flooding the discourse).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.