Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR) 2014
DOI: 10.3115/v1/w14-1214
|View full text |Cite
|
Sign up to set email alerts
|

An Analysis of Crowdsourced Text Simplifications

Abstract: We present a study on the text simplification operations undertaken collaboratively by Simple English Wikipedia contributors. The aim is to understand whether a complex-simple parallel corpus involving this version of Wikipedia is appropriate as data source to induce simplification rules, and whether we can automatically categorise the different operations performed by humans. A subset of the corpus was first manually analysed to identify its transformation operations. We then built machine learning models to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 16 publications
(12 citation statements)
references
References 11 publications
(8 reference statements)
0
9
0
Order By: Relevance
“…First, the Simple English Wikipedia data has dominated simplification research since 2010 (Zhu et al, 2010;Siddharthan, 2014), and is used together with Standard English Wikipedia to create parallel text to train MT-based simplification systems. However, recent studies (Xu et al, 2015;Amancio and Specia, 2014;Hwang et al, 2015;Štajner et al, 2015) showed that the parallel Wikipedia simplification corpus contains a large proportion of inadequate (not much simpler) or inaccurate (not aligned or only partially aligned) simplifications. It is one of the leading reasons that existing simplification systems struggle to generate simplifying paraphrases and leave the input sentences unchanged (Wubben et al, 2012).…”
Section: Introductionmentioning
confidence: 99%
“…First, the Simple English Wikipedia data has dominated simplification research since 2010 (Zhu et al, 2010;Siddharthan, 2014), and is used together with Standard English Wikipedia to create parallel text to train MT-based simplification systems. However, recent studies (Xu et al, 2015;Amancio and Specia, 2014;Hwang et al, 2015;Štajner et al, 2015) showed that the parallel Wikipedia simplification corpus contains a large proportion of inadequate (not much simpler) or inaccurate (not aligned or only partially aligned) simplifications. It is one of the leading reasons that existing simplification systems struggle to generate simplifying paraphrases and leave the input sentences unchanged (Wubben et al, 2012).…”
Section: Introductionmentioning
confidence: 99%
“…For text simplification, given a sentence to be simplified, we can solicit human simplifications from the crowdsourcing platform. However, the quality of the resulting simplifications is often of widely varying quality (Amancio and Specia, 2014); the workers are not experts and it can be difficult to give the workers the appropriate context, e.g., the target audience, etc.…”
Section: Improving Human Simplificationmentioning
confidence: 99%
“…In this paper, we examine a crowdsourcing approach to produce simplifications more efficiently and of higher quality using non-experts. Crowdsourcing has been suggested previously as a possible source of text simplifications (Amancio and Specia, 2014;Lasecki et al, 2015), however, no work has addressed quality control or how to deal with varying simplicity targets. The top part of Table 1 shows an example sentence to be simplified with two non-expert simplifications obtained through a crowdsourcing platform.…”
Section: Introductionmentioning
confidence: 99%
“…Coster and Kauchak (2011b) used word alignments on C&K-1 and found rewordings (65%), deletions (47%), reorders (34%), merges (31%), and splits (27%). Amancio and Specia (2014) extracted 143 instances also from C&K-1, and manually annotated the simplification transformations performed: sentence splitting, paraphrasing (either single word or whole sentence), drop of information, sentence reordering, information insertion, and misalignment. They found that the most common operations were paraphrasing (39.8%) and drop of information (26.76%).…”
Section: Suitability For Simplificationmentioning
confidence: 99%