2021
DOI: 10.1038/s41597-021-01047-x
|View full text |Cite
|
Sign up to set email alerts
|

DUKweb, diachronic word representations from the UK Web Archive corpus

Abstract: Lexical semantic change (detecting shifts in the meaning and usage of words) is an important task for social and cultural studies as well as for Natural Language Processing applications. Diachronic word embeddings (time-sensitive vector representations of words that preserve their meaning) have become the standard resource for this task. However, given the significant computational resources needed for their generation, very few resources exist that make diachronic word embeddings available to the scientific c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 30 publications
0
2
0
Order By: Relevance
“…However, training these models is stochastic, which results in arbitrary coordinate spaces. Model alignment is an essential step in allowing word2vec models to be compared [38,39]. Before alignment, each model has its own unique coordinate space (Figures 1A), and each word is represented within that space (Figure 1B).…”
Section: Resultsmentioning
confidence: 99%
“…However, training these models is stochastic, which results in arbitrary coordinate spaces. Model alignment is an essential step in allowing word2vec models to be compared [38,39]. Before alignment, each model has its own unique coordinate space (Figures 1A), and each word is represented within that space (Figure 1B).…”
Section: Resultsmentioning
confidence: 99%
“…The data sources that have been used to train diachronic word embeddings in recent years are diverse. Tsakalidis et al (2021), for example, used a corpus of websites from the UK called DUKweb, Brandl and Lassner (2019) used two newspaper corpora in English and German, and many researchers use the Google Books corpus (Boukhaled et al, 2019;Vijayarani and Geetha, 2020;Yüksel et al, 2021).…”
Section: Related Workmentioning
confidence: 99%