Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume 2021
DOI: 10.18653/v1/2021.eacl-main.33
|View full text |Cite
|
Sign up to set email alerts
|

ParaSCI: A Large Scientific Paraphrase Dataset for Longer Paraphrase Generation

Abstract: We propose ParaSCI, the first large-scale paraphrase dataset in the scientific field, including 33,981 paraphrase pairs from ACL (ParaSCI-ACL) and 316,063 pairs from arXiv (ParaSCI-arXiv). Digging into characteristics and common patterns of scientific papers, we construct this dataset though intra-paper and inter-paper methods, such as collecting citations to the same paper or aggregating definitions by scientific terms. To take advantage of sentences paraphrased partially, we put up PDBERT as a general paraph… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 23 publications
(9 citation statements)
references
References 17 publications
0
8
0
Order By: Relevance
“…Given the same context, we then build models for text simplification using ACCESS (Martin et al, 2020) and MUSS (Martin et al, 2021), which are built on top of BERT and BART, respectively; a paraphrasing model that fine-tunes BART using ParaSCI (Dong et al, 2021) which contains paraphrase pairs from scientific papers; a negation generation model based on CROSSAUG (Lee et al, 2021) which fine-tunes BART to produce text that contradicts the given context.…”
Section: Evaluation Of Existing Nlp Toolsmentioning
confidence: 99%
See 1 more Smart Citation
“…Given the same context, we then build models for text simplification using ACCESS (Martin et al, 2020) and MUSS (Martin et al, 2021), which are built on top of BERT and BART, respectively; a paraphrasing model that fine-tunes BART using ParaSCI (Dong et al, 2021) which contains paraphrase pairs from scientific papers; a negation generation model based on CROSSAUG (Lee et al, 2021) which fine-tunes BART to produce text that contradicts the given context.…”
Section: Evaluation Of Existing Nlp Toolsmentioning
confidence: 99%
“…We create a paraphrase generation model BART-PARA-SCI by fine-tuning the bart-paraphrase 4 checkpoint on the ParaSCI-ACL (Dong et al, 2021) dataset, which contains 33,981 paraphrase pairs from articles published in ACL conferences and workshops. The model is trained for 10 epochs, using Adam optimizer with default parameters (β 1 , β 2 )=(0.9, 0.999) and =1e-08.…”
Section: A Details For Nlp Models On Selected Tasksmentioning
confidence: 99%
“…We create a paraphrase generation model BART-PARA-SCI by fine-tuning the bart-paraphrase 4 checkpoint on the ParaSCI-ACL (Dong et al, 2021) dataset, which contains 33,981 paraphrase pairs from articles published in ACL conferences and workshops. The model is trained for 10 epochs, using Adam optimizer with default parameters (β 1 , β 2 )=(0.9, 0.999) and ϵ=1e-08.…”
Section: A4 Paraphrasingmentioning
confidence: 99%
“…It has been used to study paraphrase generation (Dong et al, 2021) and statement strength (Tan and Lee, 2014). We first download the L A T E X source code for 750 randomly sampled papers and their historical versions, then use OpenDetex 8 package to extract plain text from them.…”
Section: A Multi-genre Benchmark For Monolingual Word Alignmentmentioning
confidence: 99%