UCD : Diachronic Text Classification with Character, Word, and Syntactic N-grams

Szymanski, Terrence; Lynch, Gerard

doi:10.18653/v1/s15-2148

Cited by 13 publications

(11 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this case, including deeper text features, such as those encoding syntactic information, might help the system to abstract away from the lexical level. A first step in this direction is attempted by Szymanski and Lynch (2015) who employ Google Syntactic N-grams in an SVM-based system that participated to the Diachronic Text Evaluation shared task at SemEval 2015.…”

Section: Possible Improvementsmentioning

confidence: 99%

Flattening the Curve of the COVID-19 Infodemic: These Evaluation Campaigns Can Help!

Nakov¹

2020

EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020

View full text Add to dashboard Cite

Welcome to EVALITA 2020! EVALITA is the evaluation campaign of Natural Language Processing and Speech Tools for Italian. EVALITA is an initiative of the Italian Association for Computational Linguistics (AILC, http://www.ai-lc.it) and it is endorsed by the Italian Association for Artificial Intelligence (AIxIA, http://www.aixia.it) and the Italian Association for Speech Sciences (AISV, http://www.aisv.it).This volume includes the reports of both task organisers and participants to all of the EVALITA 2020 challenges. In the 2020 edition, we coordinated the organization of 14 different tasks belonging to five research areas, being: (i) Affect, Hate, and Stance, (ii) Creativity and Style, (iii) New Challenges in Long-standing Tasks, (iv) Semantics and Multimodality, Time and Diachrony.The volume is opened by an overview to the EVALITA 2020 campaign, in which we describe the tasks, provide statistics on the participants and task organizers as well as our supporting sponsors. The abstract of the keynote speech made by Preslav Nakov titled "Flattening the Curve of the COVID-19 Infodemic: These Evaluation Campaigns Can Help!" is also included in this collection.Due to the 2020 COVID-19 pandemic, the traditional workshop was held online, where several members of the Italian NLP Community presented the results of their research. Despite the circumstances, the workshop represented an occasion for all participants from both academic institutions and private companies to disseminate their work and results and to share ideas through online sessions dedicated to each task and a general discussion during the plenary event.We carried on with the tradition of the "Best system across tasks" award. As in 2018, it represented an incentive for students, IT developers and researchers to push the boundaries of the state of the art by facing tasks in new ways, even if not winning.

show abstract

Section: Possible Improvementsmentioning

confidence: 99%

Flattening the Curve of the COVID-19 Infodemic: These Evaluation Campaigns Can Help!

Nakov¹

2020

EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020

View full text Add to dashboard Cite

show abstract

“…This involves searching for specific mentions of time within the text, searching for named entities present in the text and then establishing their reference time by linking these to Wikipedia, using Google n-grams, and linguistic features indicative of language change. Finally, UCD (Szymanski and Lynch, 2015) employs SVMs for classification using a variety of informative features (e.g., POS-tag n-grams, syntactic phrases), which were optimized for the task through automatic feature selection.…”

Section: Experiments 4: Task-based Evaluationmentioning

confidence: 99%

“…We also trained a multiclass SVM which uses character n-gram (n ∈ {1, 2, 3}) features in addition to the model features. Szymanski and Lynch (2015) identified character n-grams as the most predictive feature for temporal text classification using SVMs.…”

Section: Supervised Classificationmentioning

confidence: 99%

A Bayesian Model of Diachronic Meaning Change

Frermann

Lapata

2016

TACL

113

157

View full text Add to dashboard Cite

Word meanings change over time and an automated procedure for extracting this information from text would be useful for historical exploratory studies, information retrieval or question answering. We present a dynamic Bayesian model of diachronic meaning change, which infers temporal word representations as a set of senses and their prevalence. Unlike previous work, we explicitly model language change as a smooth, gradual process. We experimentally show that this modeling decision is beneficial: our model performs competitively on meaning change detection tasks whilst inducing discernible word senses and their development over time. Application of our model to the SemEval-2015 temporal classification benchmark datasets further reveals that it performs on par with highly optimized task-specific systems.

show abstract

“…An example is the diachronic text evaluation challenge (Popescu and Strapparava, 2015) in SemEval 2015, where newspaper text snippets from 1700-2010 had to be classified into time intervals of different sizes. Models for diachronic text classification are trained based on the way lexical, morphological, syntactic and stylistic features change over time (Abe and Tsumoto, 2010;Garcia-Fernandez et al, 2011;Popescu and Strapparava, 2015;Štajner and Zampieri, 2013;Szymanski and Lynch, 2015;Zampieri et al, 2016;Boldsen and Paggio, 2019).…”

Section: Introductionmentioning

confidence: 99%

Identifying Temporal Trends Based on Perplexity and Clustering: Are We Looking at Language Change?

Boldsen¹,

Agirrezabal²,

Paggio³

2019

Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change

View full text Add to dashboard Cite

In this work we propose a data-driven methodology for identifying temporal trends in a corpus of medieval charters. We have used perplexities derived from RNNs as a distance measure between documents and then, performed clustering on those distances. We argue that perplexities calculated by such language models are representative of temporal trends. The clusters produced using the K-Means algorithm give an insight of the differences in language in different time periods at least partly due to language change. We suggest that the temporal distribution of the individual clusters might provide a more nuanced picture of temporal trends compared to discrete bins, thus providing better results when used in a classification task.

show abstract

UCD : Diachronic Text Classification with Character, Word, and Syntactic N-grams

Cited by 13 publications

References 5 publications

Flattening the Curve of the COVID-19 Infodemic: These Evaluation Campaigns Can Help!

Flattening the Curve of the COVID-19 Infodemic: These Evaluation Campaigns Can Help!

A Bayesian Model of Diachronic Meaning Change

Identifying Temporal Trends Based on Perplexity and Clustering: Are We Looking at Language Change?

Contact Info

Product

Resources

About