Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.91
|View full text |Cite
|
Sign up to set email alerts
|

Small but Mighty: New Benchmarks for Split and Rephrase

Abstract: Split and Rephrase is a text simplification task of rewriting a complex sentence into simpler ones. As a relatively new task, it is paramount to ensure the soundness of its evaluation benchmark and metric. We find that the widely used benchmark dataset universally contains easily exploitable syntactic cues caused by its automatic generation process. Taking advantage of such cues, we show that even a simple rule-based model can perform on par with the state-of-the-art model. To remedy such limitations, we colle… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
7
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(8 citation statements)
references
References 13 publications
(18 reference statements)
1
7
0
Order By: Relevance
“…When compared with WIK-ISPLIT, BISECT contains significantly more highquality pairs, while containing fewer pairs with significant errors. Pairs containing unsupported and deleted details are comparable across corpora, though WIKISPLIT skews more towards adding unsupported information, which is consistent with previous work (Zhang et al, 2020a). Moreover, we take 100 random samples from the German BISECT corpus and perform manual inspection.…”
Section: Comparison To Existing Corporasupporting
confidence: 77%
See 3 more Smart Citations
“…When compared with WIK-ISPLIT, BISECT contains significantly more highquality pairs, while containing fewer pairs with significant errors. Pairs containing unsupported and deleted details are comparable across corpora, though WIKISPLIT skews more towards adding unsupported information, which is consistent with previous work (Zhang et al, 2020a). Moreover, we take 100 random samples from the German BISECT corpus and perform manual inspection.…”
Section: Comparison To Existing Corporasupporting
confidence: 77%
“…Thus, proposed the Split and Rephrase task, and introduced the WEBSPLIT corpus, created by aligning sentences in WebNLG . WEBSPLIT contains duplicate instances and phrasal repetitions (Aharoni and Goldberg, 2018;Botha et al, 2018), and most splitting operations can be trivially classified (Zhang et al, 2020a), so subsequent Split and Rephrase corpora have been created to improve training (Botha et al, 2018) and evaluation (Sulem et al, 2018;Zhang et al, 2020a). The main work we compare against is WIKISPLIT, a corpus created by extracting split sentences from Wikipedia edit histories (Botha et al, 2018).…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…fluency and adequacy ratings with binary questions described in Zhang et al (2020a) for the second evaluation over another 100 simplifications from the NEWSELA-AUTO split-focused test set. We asked if the output sentence exhibits spitting and if the splitting occurs at the correct place.…”
Section: Human Evaluationmentioning
confidence: 99%