Lexicon of Changes: Towards the Evaluation of Diachronic Semantic Shift in Chinese

Chen, Jing; Chersoni, Emmanuele; Huang, Chu‐Ren

doi:10.18653/v1/2022.lchange-1.11

Cited by 4 publications

(7 citation statements)

References 15 publications

(30 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The weighted mean pairwise Spearman score for inter-rater agreements is 0.691, and the Krippendorff's alpha is 0.602, which are quite high if compared to other DURel datasets (Schlechtweg et al, 2021(Schlechtweg et al, , 2020(Schlechtweg et al, , 2018Erk et al, 2013;Chen et al, 2022). For more statistics, see Table 3.…”

Section: Human Annotationmentioning

confidence: 93%

“…The DURel framework and its extension DWUGs have been applied to constructing evaluation datasets for a variety of languages, such as English, Swedish, German, and Latin released in the SemEval 2020 (Schlechtweg et al, 2020), and later for Russian, Norwegian, Spanish, and Chinese (Rodina and Kutuzov, 2020;Kutuzov and Pivovarova, 2021;Kutuzov et al, 2022;Zamora-Reina et al, 2022;Chen et al, 2022). Since the nature of this paradigm is to measure usage differences between sentence pairs, it has also been extended to the construction of synchronic disambiguation datasets (Aksenova et al, 2022;Hätty et al, 2019) and to diatopic variation (i.e., usage differences across regional variations) (Baldissin et al, 2022).…”

Section: Related Workmentioning

confidence: 99%

“…The increasing number of published evaluation datasets further fostered the domain, enabling different models and hyperparameters to be quantitatively tested on the same benchmarks (Kutuzov et al, 2022;Schlechtweg et al, 2021;Aksenova et al, 2022;Chen et al, 2022;Zamora-Reina et al, 2022;Basile et al, 2019). These datasets are predominantly constructed within the framework of Diachronic Usage Relatedness (DURel), wherein changing scores are generated by calculating human ratings on semantic relatedness across a variety of usage pairs for targets (Schlechtweg et al, 2018;Rodina and Kutuzov, 2020;Chen et al, 2022). In the extended DURel framework, namely Diachronic Word Usage Graphs (DWUGs) (Schlechtweg et al, 2021(Schlechtweg et al, , 2020, the usages could be further populated through Word Usage Graphs (WUGs) for visualization (McCarthy et al, 2016;Kutuzov et al, 2022).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

ChiWUG: A Graph-based Evaluation Dataset for Chinese Lexical Semantic Change Detection

Chen,

Chersoni,

Schlechtweg

et al. 2023

Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change

View full text Add to dashboard Cite

Recent studies suggested that language models are efficient tools for measuring lexical semantic change. In our paper, we present the compilation of the first graph-based evaluation dataset for semantic change in the context of the Chinese language, covering the periods before and after the Reform and Opening Up.Exploiting the existing framework DURel, we collect over 61,000 human semantic relatedness judgments for 40 targets. The inferred word usage graphs and semantic change scores provide a basis for visualization and evaluation of semantic change.

show abstract

Section: Human Annotationmentioning

confidence: 93%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

ChiWUG: A Graph-based Evaluation Dataset for Chinese Lexical Semantic Change Detection

Chen,

Chersoni,

Schlechtweg

et al. 2023

Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change

View full text Add to dashboard Cite

show abstract

“…The DURel framework and its extension DWUGs have been applied to constructing evaluation datasets for a variety of languages, such as English, Swedish, German, and Latin released in the SemEval 2020 , and later for Russian, Norwegian, Spanish, and Chinese (Rodina and Kutuzov, 2020;Kutuzov and Pivovarova, 2021;Kutuzov et al, 2022;Chen et al, 2022). Since the nature of this paradigm is to measure usage differences between sentence pairs, it has also been extended to the construction of synchronic disambiguation datasets (Aksenova et al, 2022; and to diatopic variation (i.e., usage differences across regional variations) (Baldissin et al, 2022).…”

Section: Related Workmentioning

confidence: 99%

“…The increasing number of published evaluation datasets further fostered the domain, enabling different models and hyperparameters to be quantitatively tested on the same benchmarks (Kutuzov et al, 2022;Aksenova et al, 2022;Chen et al, 2022;Basile et al, 2019). These datasets are predominantly constructed within the framework of Diachronic Usage Relatedness (DURel), wherein changing scores are generated by calculating human ratings on semantic relatedness across a variety of usage pairs for targets Rodina and Kutuzov, 2020;Chen et al, 2022). In the extended DURel framework, namely Diachronic Word Usage Graphs (DWUGs) , the usages could be further populated through Word Usage Graphs (WUGs) for visualization (McCarthy et al, 2016;Kutuzov et al, 2022).…”

Section: Introductionmentioning

confidence: 99%

Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change

2023

View full text Add to dashboard Cite

Welcome to the 4th International Workshop on Computational Approaches to Historical Language Change (LChange'23) co-located with EMNLP 2023. LChange is held on December 6th, 2023, as a hybrid event with participation possible both virtually and on-site in Singapore.Characterizing the time-varying nature of language will have broad implications and applications in multiple fields including linguistics, artificial intelligence, digital humanities, computational cognitive and social sciences. In this workshop, we bring together the world's pioneers and experts in computational approaches to historical language change with a focus on digital text corpora. In doing so, this workshop carries out the triple goals of disseminating state-of-the-art research on diachronic modeling of language change, fostering cross-disciplinary collaborations, and exploring the fundamental theoretical and methodological challenges in this growing niche of computational linguistic research.In response to the call, we received 28 submissions. Each of them was carefully evaluated by at least two members of the Program Committee, whom we believed to be most appropriate for each paper. Based on the reviewers' feedback we accepted 17 full and short papers as oral or poster presentations. We had two distinguished keynote presentations: the first by Gemma Boleda (Research Professor in the Department of Translation and Language Sciences of the Universitat Pompeu Fabra, Spain) who presented a talk entitled "What does semantic change have to do with Hello Kitty? Referring as the source of change", and the second by Mario Giulianelli (a postdoctoral fellow at ETH Zurich) with the talk "Neural language models for word usage representation and analysis". Finally, we invited five EMNLP'23 Findings papers to be presented as posters, which are not included in the workshop proceedings.To further support the community, we offered five student scholarships to cover registration fees. We also offered mentoring for four young researchers on their research topic in the field of language change, either during the workshop or virtually.We hope that you will find the workshop papers insightful and inspiring. We would like to thank the keynote speakers for their stimulating talks, the authors of all papers for their interesting contributions, and the members of the Program Committee for their insightful reviews. Our special thanks go to the emergency reviewers who stepped in to provide their expertise. We also express our gratitude to the EMNLP 2023 workshop chairs for their kind assistance during the organization process. Finally, our thanks go to our gold sponsor iguanodon.ai, as well as the research project "Towards Computational Lexical Semantic Change Detection" (Swedish Research Council, contract 2018-01184) and the research program "Change is Key!" (Riksbankens Jubileumsfond, contract M21-0021).

show abstract

Lexical Semantic Change through Large Language Models: a Survey

Periti,

Montanelli

2024

ACM Comput. Surv.

View full text Add to dashboard Cite

Lexical Semantic Change (LSC) is the task of identifying, interpreting, and assessing the possible change over time in the meanings of a target word. Traditionally, LSC has been addressed by linguists and social scientists through manual and time-consuming analyses, which have thus been limited in terms of the volume, genres, and time-frame that can be considered. In recent years, computational approaches based on Natural Language Processing have gained increasing attention to automate LSC as much as possible. Significant advancements have been made by relying on Large Language Models (LLMs), which can handle the multiple usages of the words and better capture the related semantic change. In this article, we survey the approaches based on LLMs for LSC and we propose a classification framework characterized by three dimensions: meaning representation , time-awareness , and learning modality . The framework is exploited to i) review the measures for change assessment, ii) compare the approaches on performance, and iii) discuss the current issues in terms of scalability, interpretability, and robustness. Open challenges and future research directions about the use of LLMs for LSC are finally outlined.

show abstract

Lexicon of Changes: Towards the Evaluation of Diachronic Semantic Shift in Chinese

Cited by 4 publications

References 15 publications

ChiWUG: A Graph-based Evaluation Dataset for Chinese Lexical Semantic Change Detection

ChiWUG: A Graph-based Evaluation Dataset for Chinese Lexical Semantic Change Detection

Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change

Lexical Semantic Change through Large Language Models: a Survey

Contact Info

Product

Resources

About