Small but Mighty: New Benchmarks for Split and Rephrase

Zhang, Li; Zhu, Huaiyu; Brahma, Siddhartha; Li, Yunyao

doi:10.18653/v1/2020.emnlp-main.91

Cited by 4 publications

(8 citation statements)

References 13 publications

(18 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…When compared with WIK-ISPLIT, BISECT contains significantly more highquality pairs, while containing fewer pairs with significant errors. Pairs containing unsupported and deleted details are comparable across corpora, though WIKISPLIT skews more towards adding unsupported information, which is consistent with previous work (Zhang et al, 2020a). Moreover, we take 100 random samples from the German BISECT corpus and perform manual inspection.…”

Section: Comparison To Existing Corporasupporting

confidence: 77%

“…Thus, proposed the Split and Rephrase task, and introduced the WEBSPLIT corpus, created by aligning sentences in WebNLG . WEBSPLIT contains duplicate instances and phrasal repetitions (Aharoni and Goldberg, 2018;Botha et al, 2018), and most splitting operations can be trivially classified (Zhang et al, 2020a), so subsequent Split and Rephrase corpora have been created to improve training (Botha et al, 2018) and evaluation (Sulem et al, 2018;Zhang et al, 2020a). The main work we compare against is WIKISPLIT, a corpus created by extracting split sentences from Wikipedia edit histories (Botha et al, 2018).…”

Section: Related Workmentioning

confidence: 99%

“…Besides corpus size, we are interested in the amount of rephrasing (indicated by %new) and the syntactic complexity of sentences (approximated by length). In Table 2, we compare BISECT with previous split and rephrase corpora, including WIKISPLIT (Botha et al, 2018), WEB-SPLIT Aharoni and Goldberg, 2018), HSplit-Wiki (Sulem et al, 2018), Contract and Wiki-BM (Zhang et al, 2020a). BISECT is comparable in size with WIKISPLIT, while impor- We compute the number of aligned pairs (#pairs); number of unique long sentences l (#unique); the percentage of new words added to s compared to l (%new), and the average token Length of l and that of the individual split sentences.…”

Section: Comparison To Existing Corporamentioning

confidence: 99%

“…introduced the WEBSPLIT corpus based on decomposing a long sentence into RDF triples (a form of semantic representation), and generating shorter sentences from subsets of these triples. However, the reliance on RDF triples and a limited vocabulary results in unnatural expressions (Botha et al, 2018) and repeated syntactic patterns (Zhang et al, 2020a).…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

BiSECT: Learning to Split and Rephrase Sentences with Bitexts

Kim¹,

Maddela²,

Kriz³

et al. 2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

An important task in NLP applications such as sentence simplification is the ability to take a long, complex sentence and split it into shorter sentences, rephrasing as necessary. We introduce a novel dataset and a new model for this 'split and rephrase' task. Our BISECT training data consists of 1 million long English sentences paired with shorter, meaning-equivalent English sentences. We obtain these by extracting 1-2 sentence alignments in bilingual parallel corpora and then using machine translation to convert both sides of the corpus into the same language. BISECT contains higher quality training examples than previous Split and Rephrase corpora, with sentence splits that require more significant modifications. We categorize examples in our corpus, and use these categories in a novel model that allows us to target specific regions of the input sentence to be split and edited. Moreover, we show that models trained on BISECT can perform a wider variety of split operations and improve upon previous state-of-the-art approaches in automatic and human evaluations. 1

show abstract

Section: Comparison To Existing Corporasupporting

confidence: 77%

Section: Related Workmentioning

confidence: 99%

Section: Comparison To Existing Corporamentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

BiSECT: Learning to Split and Rephrase Sentences with Bitexts

Kim¹,

Maddela²,

Kriz³

et al. 2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…fluency and adequacy ratings with binary questions described in Zhang et al (2020a) for the second evaluation over another 100 simplifications from the NEWSELA-AUTO split-focused test set. We asked if the output sentence exhibits spitting and if the splitting occurs at the correct place.…”

Section: Human Evaluationmentioning

confidence: 99%

Controllable Text Simplification with Explicit Paraphrasing

Maddela¹,

Alva-Manchego²,

Xu³

2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

Text Simplification improves the readability of sentences through several rewriting transformations, such as lexical paraphrasing, deletion, and splitting. Current simplification systems are predominantly sequence-to-sequence models that are trained end-to-end to perform all these operations simultaneously. However, such systems limit themselves to mostly deleting words and cannot easily adapt to the requirements of different target audiences. In this paper, we propose a novel hybrid approach that leverages linguistically-motivated rules for splitting and deletion, and couples them with a neural paraphrasing model to produce varied rewriting styles. We introduce a new data augmentation method to improve the paraphrasing capability of our model. Through automatic and manual evaluations, we show that our proposed model establishes a new state-ofthe art for the task, paraphrasing more often than the existing systems, and can control the degree of each simplification operation applied to the input texts. 1

show abstract

Knowledge Transfer to Solve Split and Rephrase

AlAjlouni,

2023

2023 International Conference on Information Technology (ICIT)

View full text Add to dashboard Cite

Small but Mighty: New Benchmarks for Split and Rephrase

Cited by 4 publications

References 13 publications

BiSECT: Learning to Split and Rephrase Sentences with Bitexts

BiSECT: Learning to Split and Rephrase Sentences with Bitexts

Controllable Text Simplification with Explicit Paraphrasing

Knowledge Transfer to Solve Split and Rephrase

Contact Info

Product

Resources

About