Sian Gooding scite author profile

This paper presents the winning systems we submitted to the Complex Word Identification Shared Task 2018. We describe our best performing systems' implementations and discuss our key findings from this research. Our best-performing systems achieve an F 1 score of 0.8736 on the NEWS, 0.8400 on the WIKINEWS and 0.8115 on the WIKIPEDIA test sets in the monolingual English binary classification track, and a mean absolute error of 0.0558 on the NEWS, 0.0674 on the WIKINEWS and 0.0739 on the WIKIPEDIA test sets in the probabilistic track.

show abstract

Complex Word Identification as a Sequence Labelling Task

Gooding¹,

Kochmar²

2019

View full text Add to dashboard Cite

Complex Word Identification (CWI) is concerned with detection of words in need of simplification and is a crucial first step in a simplification pipeline. It has been shown that reliable CWI systems considerably improve text simplification. However, most CWI systems to date address the task on a word-byword basis, not taking the context into account. In this paper, we present a novel approach to CWI based on sequence modelling. Our system is capable of performing CWI in context, does not require extensive feature engineering and outperforms state-of-the-art systems on this task.

show abstract

Word Complexity is in the Eye of the Beholder

Gooding

Kochmar²,

Yimam

et al. 2021

View full text Add to dashboard Cite

Lexical complexity is a highly subjective notion, yet this factor is often neglected in lexical simplification and readability systems which use a "one-size-fits-all" approach. In this paper, we investigate which aspects contribute to the notion of lexical complexity in various groups of readers, focusing on native and nonnative speakers of English, and how the notion of complexity changes depending on the proficiency level of a non-native reader. To facilitate reproducibility of our approach and foster further research into these aspects, we release a dataset of complex words annotated by readers with different backgrounds.

show abstract

Recursive Context-Aware Lexical Simplification

Gooding¹,

Kochmar²

2019

View full text Add to dashboard Cite

This paper presents a novel architecture for recursive context-aware lexical simplification, REC-LS, that is capable of (1) making use of the wider context when detecting the words in need of simplification and suggesting alternatives, and (2) taking previous simplification steps into account. We show that our system outputs lexical simplifications that are grammatically correct and semantically appropriate, and outperforms the current state-of-theart systems in lexical simplification.

show abstract

Predicting Text Readability from Scrolling Interactions

Gooding¹,

Berzak²,

Mak³

et al. 2021

View full text Add to dashboard Cite

Judging the readability of text has many important applications, for instance when performing text simplification or when sourcing reading material for language learners. In this paper, we present a 518 participant study which investigates how scrolling behaviour relates to the readability of English texts. We make our dataset publicly available and show that (1) there are statistically significant differences in the way readers interact with text depending on the text level, (2) such measures can be used to predict the readability of text, and (3) the background of a reader impacts their reading interactions and the factors contributing to text difficulty. 1

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sian Gooding

CAMB at CWI Shared Task 2018: Complex Word Identification with Ensemble-Based Voting

Complex Word Identification as a Sequence Labelling Task

Word Complexity is in the Eye of the Beholder

Recursive Context-Aware Lexical Simplification

Predicting Text Readability from Scrolling Interactions

Contact Info

Product

Resources

About