Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015) 2015
DOI: 10.18653/v1/s15-2148
|View full text |Cite
|
Sign up to set email alerts
|

UCD : Diachronic Text Classification with Character, Word, and Syntactic N-grams

Abstract: We present our submission to SemEval-2015 Task 7: Diachronic Text Evaluation, in which we approach the task of assigning a date to a text as a multi-class classification problem. We extract n-gram features from the text at the letter, word, and syntactic level, and use these to train a classifier on date-labeled training data. We also incorporate date probabilities of syntactic features as estimated from a very large external corpus of books. Our system achieved the highest performance of all systems on subt… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2016
2016
2020
2020

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 13 publications
(11 citation statements)
references
References 5 publications
0
10
0
Order By: Relevance
“…In this case, including deeper text features, such as those encoding syntactic information, might help the system to abstract away from the lexical level. A first step in this direction is attempted by Szymanski and Lynch (2015) who employ Google Syntactic N-grams in an SVM-based system that participated to the Diachronic Text Evaluation shared task at SemEval 2015.…”
Section: Possible Improvementsmentioning
confidence: 99%
“…In this case, including deeper text features, such as those encoding syntactic information, might help the system to abstract away from the lexical level. A first step in this direction is attempted by Szymanski and Lynch (2015) who employ Google Syntactic N-grams in an SVM-based system that participated to the Diachronic Text Evaluation shared task at SemEval 2015.…”
Section: Possible Improvementsmentioning
confidence: 99%
“…This involves searching for specific mentions of time within the text, searching for named entities present in the text and then establishing their reference time by linking these to Wikipedia, using Google n-grams, and linguistic features indicative of language change. Finally, UCD (Szymanski and Lynch, 2015) employs SVMs for classification using a variety of informative features (e.g., POS-tag n-grams, syntactic phrases), which were optimized for the task through automatic feature selection.…”
Section: Experiments 4: Task-based Evaluationmentioning
confidence: 99%
“…We also trained a multiclass SVM which uses character n-gram (n ∈ {1, 2, 3}) features in addition to the model features. Szymanski and Lynch (2015) identified character n-grams as the most predictive feature for temporal text classification using SVMs.…”
Section: Supervised Classificationmentioning
confidence: 99%
“…An example is the diachronic text evaluation challenge (Popescu and Strapparava, 2015) in SemEval 2015, where newspaper text snippets from 1700-2010 had to be classified into time intervals of different sizes. Models for diachronic text classification are trained based on the way lexical, morphological, syntactic and stylistic features change over time (Abe and Tsumoto, 2010;Garcia-Fernandez et al, 2011;Popescu and Strapparava, 2015;Štajner and Zampieri, 2013;Szymanski and Lynch, 2015;Zampieri et al, 2016;Boldsen and Paggio, 2019).…”
Section: Introductionmentioning
confidence: 99%