ParaSCI: A Large Scientific Paraphrase Dataset for Longer Paraphrase Generation

Qingxiu, Dong,; Wan, Xiaojun; Cao, Yue

doi:10.18653/v1/2021.eacl-main.33

Cited by 23 publications

(9 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Given the same context, we then build models for text simplification using ACCESS (Martin et al, 2020) and MUSS (Martin et al, 2021), which are built on top of BERT and BART, respectively; a paraphrasing model that fine-tunes BART using ParaSCI (Dong et al, 2021) which contains paraphrase pairs from scientific papers; a negation generation model based on CROSSAUG (Lee et al, 2021) which fine-tunes BART to produce text that contradicts the given context.…”

Section: Evaluation Of Existing Nlp Toolsmentioning

confidence: 99%

“…We create a paraphrase generation model BART-PARA-SCI by fine-tuning the bart-paraphrase 4 checkpoint on the ParaSCI-ACL (Dong et al, 2021) dataset, which contains 33,981 paraphrase pairs from articles published in ACL conferences and workshops. The model is trained for 10 epochs, using Adam optimizer with default parameters (β 1 , β 2 )=(0.9, 0.999) and =1e-08.…”

Section: A Details For Nlp Models On Selected Tasksmentioning

confidence: 99%

See 1 more Smart Citation

Towards Process-Oriented, Modular, and Versatile Question Generation that Meets Educational Needs

Wang¹,

Fan²,

Houghton³

et al. 2022

Preprint

View full text Add to dashboard Cite

NLP-powered automatic question generation (QG) techniques carry great pedagogical potential of saving educators' time and benefiting student learning. Yet, QG systems have not been widely adopted in classrooms to date. In this work, we aim to pinpoint key impediments and investigate how to improve the usability of automatic QG techniques for educational purposes by understanding how instructors construct questions and identifying touch points to enhance the underlying NLP models. We perform an in-depth need finding study with 11 instructors across 7 different universities, and summarize their thought processes and needs when creating questions. While instructors show great interests in using NLP systems to support question design, none of them has used such tools in practice. They resort to multiple sources of information, ranging from domain knowledge to students' misconceptions, all of which missing from today's QG systems. We argue that building effective human-NLP collaborative QG systems that emphasize instructor control and explainability is imperative for real-world adoption. We call for QG systems to provide processoriented support, use modular design, and handle diverse sources of input.

show abstract

Section: Evaluation Of Existing Nlp Toolsmentioning

confidence: 99%

Section: A Details For Nlp Models On Selected Tasksmentioning

confidence: 99%

Towards Process-Oriented, Modular, and Versatile Question Generation that Meets Educational Needs

Wang¹,

Fan²,

Houghton³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

Section: A4 Paraphrasingmentioning

confidence: 99%

Towards Process-Oriented, Modular, and Versatile Question Generation that Meets Educational Needs

Wang¹,

Fan²,

Houghton³

et al. 2022

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

NLP-powered automatic question generation (QG) techniques carry great pedagogical potential of saving educators' time and benefiting student learning. Yet, QG systems have not been widely adopted in classrooms to date. In this work, we aim to pinpoint key impediments and investigate how to improve the usability of automatic QG techniques for educational purposes by understanding how instructors construct questions and identifying touch points to enhance the underlying NLP models. We perform an in-depth need finding study with 11 instructors across 7 different universities, and summarize their thought processes and needs when creating questions. While instructors show great interests in using NLP systems to support question design, none of them has used such tools in practice. They resort to multiple sources of information, ranging from domain knowledge to students' misconceptions, all of which missing from today's QG systems. We argue that building effective human-NLP collaborative QG systems that emphasize instructor control and explainability is imperative for real-world adoption. We call for QG systems to provide process-oriented support, use modular design, and handle diverse sources of input.

show abstract

“…It has been used to study paraphrase generation (Dong et al, 2021) and statement strength (Tan and Lee, 2014). We first download the L A T E X source code for 750 randomly sampled papers and their historical versions, then use OpenDetex 8 package to extract plain text from them.…”

Section: A Multi-genre Benchmark For Monolingual Word Alignmentmentioning

confidence: 99%

Neural semi-Markov CRF for Monolingual Word Alignment

Lan

Jiang

2021

Preprint

View full text Add to dashboard Cite

Monolingual word alignment is important for studying fine-grained editing operations (i.e., deletion, addition, and substitution) in textto-text generation tasks, such as paraphrase generation, text simplification, neutralizing biased language, etc. In this paper, we present a novel neural semi-Markov CRF alignment model, which unifies word and phrase alignments through variable-length spans. We also create a new benchmark with human annotations that cover four different text genres to evaluate monolingual word alignment models in more realistic settings. Experimental results show that our proposed model outperforms all previous approaches for monolingual word alignment as well as a competitive QA-based baseline, which was previously only applied to bilingual data. Our model demonstrates good generalizability to three out-of-domain datasets and shows great utility in two downstream applications: automatic text simplification and sentence pair classification tasks. 1

show abstract

ParaSCI: A Large Scientific Paraphrase Dataset for Longer Paraphrase Generation

Cited by 23 publications

References 17 publications

Towards Process-Oriented, Modular, and Versatile Question Generation that Meets Educational Needs

Towards Process-Oriented, Modular, and Versatile Question Generation that Meets Educational Needs

Towards Process-Oriented, Modular, and Versatile Question Generation that Meets Educational Needs

Neural semi-Markov CRF for Monolingual Word Alignment

Contact Info

Product

Resources

About