Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change 2019
DOI: 10.18653/v1/w19-4711
|View full text |Cite
|
Sign up to set email alerts
|

Identifying Temporal Trends Based on Perplexity and Clustering: Are We Looking at Language Change?

Abstract: In this work we propose a data-driven methodology for identifying temporal trends in a corpus of medieval charters. We have used perplexities derived from RNNs as a distance measure between documents and then, performed clustering on those distances. We argue that perplexities calculated by such language models are representative of temporal trends. The clusters produced using the K-Means algorithm give an insight of the differences in language in different time periods at least partly due to language change. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 9 publications
0
4
0
Order By: Relevance
“…Automatic periodization within a language is a related task, and for this aim, Degaetano-Ortlieb and Teich (2018) use relative entropy. For a similar aim, combination of perplexity and Recurrent Neural Networks (RNN) has been used for identifying temporal trends in a corpus of medieval charters (Boldsen, Agirrezabal, and Paggio, 2019).…”
Section: Corpus-driven Methodologiesmentioning
confidence: 99%
“…Automatic periodization within a language is a related task, and for this aim, Degaetano-Ortlieb and Teich (2018) use relative entropy. For a similar aim, combination of perplexity and Recurrent Neural Networks (RNN) has been used for identifying temporal trends in a corpus of medieval charters (Boldsen, Agirrezabal, and Paggio, 2019).…”
Section: Corpus-driven Methodologiesmentioning
confidence: 99%
“…Early studies employ features by manual work to recognize temporal expressions within documents (Dalli, 2006;Kanhabua and Nørvåg, 2016;Niculae et al, 2014), which suffer from the problem of generalization and coverage rate. Traditional machine learning methods focus on statistical features and learning models, such as Naïve Bayes (Boldsen and Wahlberg, 2021), SVM (Garcia-Fernandez et al, 2011) and Random Forests (Ciobanu et al, 2013). Recent studies turn to deep learning methods, and the experiments show their superior performances compared to traditional machine learning ones (Kulkarni et al, 2018;Liebeskind and Liebeskind, 2020;Yu and Huangfu, 2019;Ren et al, 2022).…”
Section: Related Workmentioning
confidence: 99%
“…torical documents for the construction of digital libraries (Baledent et al, 2020). Such task is also called historical text dating (Boldsen and Wahlberg, 2021), diachronic text evaluation (Popescu and Strapparava, 2015), or period classification (Tian and Kübler, 2021). Compared to other dating tasks, historical text dating is more challenging as explicit temporal mentions(e.g., time expressions) that help to determine the written date of a document usually do not appear in it (Toner and Han, 2019).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation