Most text analysis and retrieval work to date has focused on the topic of a text; that is, what it is about. However, a text also contains much useful information in its style, or how it is written. This includes information about its author, its purpose, feelings it is meant to evoke, and more. This article develops a new type of lexical feature for use in stylistic text classification, based on taxonomies of various semantic functions of certain choice words or phrases. We demonstrate the usefulness of such features for the stylistic text classification tasks of determining author identity and nationality, the gender of literary characters, a text's sentiment (positive/ negative evaluation), and the rhetorical character of scientific journal articles. We further show how the use of functional features aids in gaining insight about stylistic differences among different kinds of texts.
A key focus of current science education reforms involves developing inquirybased learning materials. However, without an understanding of how working scientists actually do science, such learning materials cannot be properly developed. Until now, research on scientific reasoning has focused on cognitive studies of individual scientific fields. However, the question remains as to whether scientists in different fields fundamentally rely on different methodologies. Although many philosophers and historians of science do indeed assert that there is no single monolithic scientific method, this has never been tested empirically. We therefore approach this problem by analyzing patterns of language used by scientists in their published work. Our results demonstrate systematic variation in language use between types of science that are thought to differ in their characteristic methodologies. The features of language use that were found correspond closely to a proposed distinction between Experimental Sciences (e.g., chemistry) and Historical Sciences (e.g., paleontology); thus, different underlying rhetorical and conceptual mechanisms likely operate for scientific reasoning and communication in different contexts.
Recently, philosophers of science have argued that the epistemological requirements of different scientific fields lead necessarily to differences in scientific method. In this paper, we examine possible variation in how language is used in peer-reviewed journal articles from various fields to see if features of such variation may help to elucidate and support claims of methodological variation among the sciences. We hypothesize that significant methodological differences will be reflected in related differences in scientists' language style. This paper reports a corpus-based study of peer-reviewed articles from twelve separate journals in six fields of experimental and historical sciences. Machine learning methods were applied to compare the discourse styles of articles in different fields, based on easily-extracted linguistic features of the text. Features included function word frequencies, as used often in computational stylistics, as well as lexical features based on systemic functional linguistics, which affords rich resources for comparative textual analysis. We found that indeed the style of writing in the historical sciences is readily distinguishable from that of the experimental sciences. Furthermore, the most significant linguistic features of these distinctive styles are directly related to the methodological differences posited by philosophers of science between historical and experimental sciences, lending empirical weight to their contentions.
This paper focuses on a method for the stylistic segmentation of text documents. Our technique involves mapping the change in a feature throughout a text. We use the linguistic features of conjunction and modality, through taxonomies from Systemic Functional Linguistics. This segmentation has applications in automated summarization, particularly of large documents.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.