Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2021
DOI: 10.18653/v1/2021.naacl-main.351
|View full text |Cite
|
Sign up to set email alerts
|

Word Complexity is in the Eye of the Beholder

Abstract: Lexical complexity is a highly subjective notion, yet this factor is often neglected in lexical simplification and readability systems which use a "one-size-fits-all" approach. In this paper, we investigate which aspects contribute to the notion of lexical complexity in various groups of readers, focusing on native and nonnative speakers of English, and how the notion of complexity changes depending on the proficiency level of a non-native reader. To facilitate reproducibility of our approach and foster furthe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
11
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 8 publications
(14 citation statements)
references
References 29 publications
1
11
0
Order By: Relevance
“…In order to facilitate personalised CWI, and therefore personalised text simplification and readability systems, audience specific complexity annotations are required (Bingel et al, 2018). Whilst it has been shown that the concept of word complexity, and thus the level of agreement, is aligned between individuals sharing a common background (Gooding and Kochmar, 2018a;Gooding et al, 2021), we argue that the best CWI model for each individual is trained with them 'in the loop'.…”
Section: Introductionmentioning
confidence: 80%
See 1 more Smart Citation
“…In order to facilitate personalised CWI, and therefore personalised text simplification and readability systems, audience specific complexity annotations are required (Bingel et al, 2018). Whilst it has been shown that the concept of word complexity, and thus the level of agreement, is aligned between individuals sharing a common background (Gooding and Kochmar, 2018a;Gooding et al, 2021), we argue that the best CWI model for each individual is trained with them 'in the loop'.…”
Section: Introductionmentioning
confidence: 80%
“…However, annotating the difficulty of words is a subjective task, and previous data collection has yielded low levels of annotator agreement (Specia et al, 2012;Paetzold and Specia, 2016c). As a way of mitigating individual differences, such datasets typically present a homogeneous view on word complexity, by merging annotations across readers (Gooding et al, 2021). Further attempts to improve annotator agreement have included offering bonus incentives for annotators who select words matching other annotations (Yimam et al, 2017), as well as providing guidelines for annotators to mark words that they assume would be complex for audiences such as children and those with learning difficulties.…”
Section: Introductionmentioning
confidence: 99%
“…This finding correlates well with existing research on readability. Linguistic features, such as word length, word frequency, and word familiarity, and other structural features have proven to be highly relevant and reliable predictors of textual complexity and difficulty [ 39 ].…”
Section: Discussionmentioning
confidence: 99%
“…Text simplification has many subtleties, as what would be a valid simplification for one reader may not be appropriate for another (Xu et al, 2015). For instance, it has been shown that the factors contributing to word complexity vary depending on the first language and proficiency level of a reader (Gooding et al, 2021b). The subjective nature of text simplification means that system evaluation is difficult.…”
Section: Datasets and Evaluationmentioning
confidence: 99%
“…In automatic text simplification, the aim is to transform text using the aforementioned operations, to allow individuals with differing comprehension levels access. This requires a fundamental understanding of what factors contribute to text complexity for differing audiences (Gooding et al, 2021b).…”
Section: Introductionmentioning
confidence: 99%