Proceedings of the Thirteenth Workshop on Innovative Use of NLP For Building Educational Applications 2018
DOI: 10.18653/v1/w18-0520
|View full text |Cite
|
Sign up to set email alerts
|

CAMB at CWI Shared Task 2018: Complex Word Identification with Ensemble-Based Voting

Abstract: This paper presents the winning systems we submitted to the Complex Word Identification Shared Task 2018. We describe our best performing systems' implementations and discuss our key findings from this research. Our best-performing systems achieve an F 1 score of 0.8736 on the NEWS, 0.8400 on the WIKINEWS and 0.8115 on the WIKIPEDIA test sets in the monolingual English binary classification track, and a mean absolute error of 0.0558 on the NEWS, 0.0674 on the WIKINEWS and 0.0739 on the WIKIPEDIA test sets in t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
57
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 42 publications
(58 citation statements)
references
References 23 publications
1
57
0
Order By: Relevance
“…Results: We report the results obtained with the sequence labelling (SEQ) model for the binary task and compare them to the current state-of-the-art in complex word identification, CAMB system by Gooding and Kochmar (2018), which achieved the best results across all binary and two probabilistic tracks in the CWI 2018 shared task (Yimam et al, 2018). The evaluation metric reported is the macro-averaged F1, as was used in the 2018 CWI shared task (Yimam et al, 2018).…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…Results: We report the results obtained with the sequence labelling (SEQ) model for the binary task and compare them to the current state-of-the-art in complex word identification, CAMB system by Gooding and Kochmar (2018), which achieved the best results across all binary and two probabilistic tracks in the CWI 2018 shared task (Yimam et al, 2018). The evaluation metric reported is the macro-averaged F1, as was used in the 2018 CWI shared task (Yimam et al, 2018).…”
Section: Resultsmentioning
confidence: 99%
“…The CAMB system considers words irrespective of their context and relies on 27 features of various types, encoding lexical, syntactic, frequencybased and other types of information about individual words. The system uses Random Forests and AdaBoost for classification, but as Gooding and Kochmar (2018) report, the choice of the features, algorithm and training data depends on the genre. In addition, phrase classification is performed using a 'greedy' approach and simply labelling all phrases as complex.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…For the Second CWI Shared Task (Yimam et al, 2018), participants built monolingual models using the datasets previously described, and also tested their cross-lingual capabilities on newly collected French data. In the monolingual track, the best systems for English (Gooding and Kochmar, 2018) differed significantly in terms of feature set size and the model's complexity, to the best systems for German and Spanish (Kajiwara and Komachi, 2018). The latter used Random Forests with eight features, whilst the former used Ad-aBoost with 5000 estimators or ensemble voting combining AdaBoost and Random Forest classifiers, with about 20 features.…”
Section: Introductionmentioning
confidence: 99%