Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanit 2017
DOI: 10.18653/v1/w17-2209
|View full text |Cite
|
Sign up to set email alerts
|

Modeling intra-textual variation with entropy and surprisal: topical vs. stylistic patterns

Abstract: We present a data-driven approach to investigate intra-textual variation by combining entropy and surprisal. With this approach we detect linguistic variation based on phrasal lexico-grammatical patterns across sections of research articles. Entropy is used to detect patterns typical of specific sections. Surprisal is used to differentiate between more and less informationally-loaded patterns as well as types of information (topical vs. stylistic). While we here focus on research articles in biology/genetics, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 9 publications
(6 citation statements)
references
References 18 publications
0
6
0
Order By: Relevance
“…For example, Hughes et al ( 2012 ) measure stylistic influence in the evolution of literature, Klingenstein et al ( 2014 ) analyze language use in criminal trials, Bochkarev et al ( 2014 ) use KLD comparing word distributions within and across languages, Pechenick et al ( 2015 ) analyze cultural and linguistic evolution, and Fankhauser et al ( 2014 ) demonstrate the applicability of KLD for corpus comparison at large. In our own work, we have used KLD to analyze the linguistic development of English scientific writing over time 1 (Degaetano-Ortlieb and Teich, 2016 ; Degaetano-Ortlieb and Strötgen, 2018 ; Degaetano-Ortlieb et al, 2019b ), to investigate intra-textual variation across sections of research papers from genetics (Degaetano-Ortlieb and Teich, 2017 ), to analyze scientifization effects in literary studies (Degaetano-Ortlieb and Piper, 2019 ), to detect typical features of history texts (Degaetano-Ortlieb et al, 2019c ), and to investigate gender- and class-specific changes in court proceedings of the Old Bailey Court (Degaetano-Ortlieb, 2018 ).…”
Section: Related Workmentioning
confidence: 99%
“…For example, Hughes et al ( 2012 ) measure stylistic influence in the evolution of literature, Klingenstein et al ( 2014 ) analyze language use in criminal trials, Bochkarev et al ( 2014 ) use KLD comparing word distributions within and across languages, Pechenick et al ( 2015 ) analyze cultural and linguistic evolution, and Fankhauser et al ( 2014 ) demonstrate the applicability of KLD for corpus comparison at large. In our own work, we have used KLD to analyze the linguistic development of English scientific writing over time 1 (Degaetano-Ortlieb and Teich, 2016 ; Degaetano-Ortlieb and Strötgen, 2018 ; Degaetano-Ortlieb et al, 2019b ), to investigate intra-textual variation across sections of research papers from genetics (Degaetano-Ortlieb and Teich, 2017 ), to analyze scientifization effects in literary studies (Degaetano-Ortlieb and Piper, 2019 ), to detect typical features of history texts (Degaetano-Ortlieb et al, 2019c ), and to investigate gender- and class-specific changes in court proceedings of the Old Bailey Court (Degaetano-Ortlieb, 2018 ).…”
Section: Related Workmentioning
confidence: 99%
“…Halliday (1988) examined the historical development of conventionalized language use in the sciences, demonstrating qualitatively that science developed a rhetorical style using already existing rhetorical elements of English in a way most relevant to the experimental style of the physical sciences. Extensive corpus work has looked at the physical sciences to better understand the linguistic processes of used to create and frame understanding in scientific work (Argamon et al, 2005) and to exploit the repetitive nature of such documents to find phrases that are more information-laden than their more conventionalized counterparts (Degaetano-Ortlieb and Teich, 2017).…”
Section: Prior Workmentioning
confidence: 99%
“…It was initially introduced in thermodynamics by Clausius [2], developed by Boltzmann and Gibbs through the 19th century [3] and generalized by Shannon in the 20th century [4] to the point that it can be applied in a broad range of areas. It has been applied to biology [5][6][7][8][9], economics [10][11][12], engineering [13][14][15], linguistics [16][17][18] and cosmology, at the center of one of the greatest open problems in science [10][11][12]. Given this general use in different fields of knowledge, it is important to think about what the measure of entropy actually represents in each different context and the possible equivalence between them.…”
Section: Introductionmentioning
confidence: 99%