Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.51
|View full text |Cite
|
Sign up to set email alerts
|

Simple, Interpretable and Stable Method for Detecting Words with Usage Change across Corpora

Abstract: The problem of comparing two bodies of text and searching for words that differ in their usage between them arises often in digital humanities and computational social science. This is commonly approached by training word embeddings on each corpus, aligning the vector spaces, and looking for words whose cosine distance in the aligned space is large. However, these methods often require extensive filtering of the vocabulary to perform well, and-as we show in this work-result in unstable, and hence less reliable… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
31
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 37 publications
(37 citation statements)
references
References 33 publications
0
31
0
Order By: Relevance
“…More broadly, prior work by Wendlandt et al (2018), Antoniak andMimno (2018), andGonen et al (2020), among others, has also shown embedding stability to be a concern in models trained on larger corpora than those used in this work. However, the role of random embedding effects on previous qualitative studies using word embeddings (e.g., Kulkarni et al (2015), Hamilton et al (2016)) has not been evaluated.…”
Section: Confidence Estimation In Embedding Analysismentioning
confidence: 86%
See 2 more Smart Citations
“…More broadly, prior work by Wendlandt et al (2018), Antoniak andMimno (2018), andGonen et al (2020), among others, has also shown embedding stability to be a concern in models trained on larger corpora than those used in this work. However, the role of random embedding effects on previous qualitative studies using word embeddings (e.g., Kulkarni et al (2015), Hamilton et al (2016)) has not been evaluated.…”
Section: Confidence Estimation In Embedding Analysismentioning
confidence: 86%
“…This method also avoids the conceptual difficulties and low replicability of comparing embedding spaces numerically (e.g. by cosine distances) (Gonen et al, 2020). However, even nearest neighborhoods are often unstable, and vary dramatically across runs of the same embedding algorithm on the same corpus (Wendlandt et al, 2018;Antoniak and Mimno, 2018).…”
Section: Identifying Stable Embeddings For Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…This approach builds on recent advancements in the study of language change using distributed temporal word embeddings (Bianchi et al., 2020; Di Carlo et al., 2019; Hilpert & Perek, 2015; Kim et al., 2014; Kulkarni et al., 2015; Perek, 2014; Sagi et al., 2011; Yao et al., 2018). Finally, we implemented and evaluated three different ways of capturing language change, which target both item‐ and neighborhood‐level patterns of change in usage patterns (Gonen et al., 2020; Hamilton et al., 2016a), offering a more thorough and insightful characterization of the phenomenon at hand.…”
Section: Discussionmentioning
confidence: 99%
“…We use three different methods to quantify language change 9 (Gonen et al., 2020; Hamilton et al., 2016a). In order to ensure comparability across words, we only computed measures of language change for words that appeared at least 25 times in each of the six slices in which the corpus was divided, such that reliable lexical representations could be learned.…”
Section: Methodsmentioning
confidence: 99%