Gustavo Henrique Paetzold scite author profile

We report the findings of the Complex Word Identification task of SemEval 2016. To create a dataset, we conduct a user study with 400 non-native English speakers, and find that complex words tend to be rarer, less ambiguous and shorter. A total of 42 systems were submitted from 21 distinct teams, and nine baselines were provided. The results highlight the effectiveness of Decision Trees and Ensemble methods for the task, but ultimately reveal that word frequencies remain the most reliable predictor of word complexity.

show abstract

Reliable Lexical Simplification for Non-Native Speakers

Paetzold

2015

102

View full text Add to dashboard Cite

Lexical Simplification is the task of modifying the lexical content of complex sentences in order to make them simpler. Due to the lack of reliable resources available for the task, most existing approaches have difficulties producing simplifications which are grammatical and that preserve the meaning of the original text. In order to improve on the state-of-the-art of this task, we propose user studies with nonnative speakers, which will result in new, sizeable datasets, as well as novel ways of performing Lexical Simplification. The results of our first experiments show that new types of classifiers, along with the use of additional resources such as spoken text language models, produce the state-of-the-art results for the Lexical Simplification task of SemEval-2012.

show abstract

A Report on the Complex Word Identification Shared Task 2018

Yimam¹,

Biemann²,

Malmasi³

et al. 2018

View full text Add to dashboard Cite

We report the findings of the second Complex Word Identification (CWI) shared task organized as part of the BEA workshop colocated with NAACL-HLT'2018. The second CWI shared task featured multilingual and multi-genre datasets divided into four tracks: English monolingual, German monolingual, Spanish monolingual, and a multilingual track with a French test set, and two tasks: binary classification and probabilistic classification. A total of 12 teams submitted their results in different task/track combinations and 11 of them wrote system description papers that are referred to in this report and appear in the BEA workshop proceedings.

show abstract

A Survey on Lexical Simplification

Paetzold¹,

Specia²

2017

jair

View full text Add to dashboard Cite

Lexical Simplification is the process of replacing complex words in a given sentence with simpler alternatives of equivalent meaning. This task has wide applicability both as an assistive technology for readers with cognitive impairments or disabilities, such as Dyslexia and Aphasia, and as a pre-processing tool for other Natural Language Processing tasks, such as machine translation and summarisation. The problem is commonly framed as a pipeline of four steps: the identification of complex words, the generation of substitution candidates, the selection of those candidates that fit the context, and the ranking of the selected substitutes according to their simplicity. In this survey we review the literature for each step in this typical Lexical Simplification pipeline and provide a benchmarking of existing approaches for these steps on publicly available datasets. We also provide pointers for datasets and resources available for the task.

show abstract

SemEval-2021 Task 1: Lexical Complexity Prediction

Shardlow¹,

Evans²,

Paetzold³

et al. 2021

View full text Add to dashboard Cite

This paper presents the results and main findings of SemEval-2021 Task 1 -Lexical Complexity Prediction. We provided participants with an augmented version of the CompLex Corpus (Shardlow et al., 2020). CompLex is an English multi-domain corpus in which words and multi-word expressions (MWEs) were annotated with respect to their complexity using a five point Likert scale. SemEval-2021 Task 1 featured two Sub-tasks: Sub-task 1 focused on single words and Sub-task 2 focused on MWEs. The competition attracted 198 teams in total, of which 54 teams submitted official runs on the test data to Sub-task 1 and 37 to Sub-task 2.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.