This paper presents a methodology for exploring systematic co‐variation of vowels using Principal Component Analysis (PCA). As a case study, we examine and build on Brand et al.'s (2021) study of systematic co‐variation amongst the monophthongs of New Zealand English (NZE) across speakers born over a 118‐year time period. We present PCA as a methodology, with information aimed at readers who may themselves want to use it in a related context. We consider tests for the appropriateness of PCA, how to select Principal Components, and how to interpret them once they have been found. At each stage, we provide code in the R programing language to enable readers to both follow our analysis and apply the same methods to their own data.
The availability of large digital archives of historical newspaper content has transformed the historical sciences. However, the scale of these archives can limit the direct application of advanced text processing methods. Even if it is computationally feasible to apply sophisticated language processing to an entire digital archive, if the material of interest is a small fraction of the archive, the results are unlikely to be useful. Methods for generating smaller specialized corpora from large archives are required to solve this problem. This article presents such a method for historical newspaper archives digitized using the METS/ALTO XML standard (Veridian Software, n.d.). The method is an ‘iterative bootstrapping’ approach in which candidate corpora are evaluated using text mining techniques, items are manually labelled, and Naïve Bayes text classifiers are trained and applied in order to produce new candidate corpora. The method is illustrated by a case study that investigates philosophical content, broadly construed, in pre-1900 English-language New Zealand newspapers. Extensive code is provided in Supplementary Materials.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.