Unsupervised Paraphrasing without Translation

Roy, Aurko; Grangier, David

doi:10.18653/v1/p19-1605

Cited by 45 publications

(54 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The general approach is to build a paraphrase generation model, usually a neural model (Prakash et al, 2016, Iyyer et al, 2018, Gupta et al, 2017, using general-purpose datasets of paraphrase sentence pairs. Data augmentation through neural paraphrasing models has been applied to various tasks such as sentiment analysis (Iyyer et al, 2018), intent classification (Roy and Grangier, 2019), and span-based question answering (Yu et al, 2018a). Paraphrasing models may generate training examples that do not match the original label.…”

Section: Related Workmentioning

confidence: 99%

AutoQA: From Databases To QA Semantic Parsers With Only Synthetic Training Data

Xu¹,

J.²,

Campagna³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

We propose AutoQA, a methodology and toolkit to generate semantic parsers that answer questions on databases, with no manual effort. Given a database schema and its data, AutoQA automatically generates a large set of high-quality questions for training that covers different database operations. It uses automatic paraphrasing combined with templatebased parsing to find alternative expressions of an attribute in different parts of speech. It also uses a novel filtered auto-paraphraser to generate correct paraphrases of entire sentences.We apply AutoQA to the Schema2QA dataset and obtain an average logical form accuracy of 62.9% when tested on natural questions, which is only 6.4% lower than a model trained with expert natural language annotations and paraphrase data collected from crowdworkers. To demonstrate the generality of AutoQA, we also apply it to the Overnight dataset. AutoQA achieves 69.8% answer accuracy, 16.4% higher than the state-of-the-art zero-shot models and only 5.2% lower than the same model trained with human data.

show abstract

Section: Related Workmentioning

confidence: 99%

AutoQA: From Databases To QA Semantic Parsers With Only Synthetic Training Data

Xu¹,

J.²,

Campagna³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

show abstract

“…We show the results that can be achieved on large automatically ranked corpora using a Sequence-to-Sequence model based on the Universal Transformer architecture as it has demonstrated superior performance over the past year in multiple generative tasks, such as abstractive summarization, machine translation and, of course, paraphrase generation. (Gupta et al, 2018;Mallinson et al, 2017;Gupta et al, 2018;Fu et al, 2019;Egonmwan and Chali, 2019;Roy and Grangier, 2019).…”

Section: Paraphrase Generationmentioning

confidence: 99%

Automatically Ranked Russian Paraphrase Corpus for Text Generation

Gudkov

Mitrofanova

Filippskikh³

2020

Proceedings of the Fourth Workshop on Neural Generation and Translation

View full text Add to dashboard Cite

The article is focused on automatic development and ranking of a large corpus for Russian paraphrase generation which proves to be the first corpus of such type in Russian computational linguistics. Existing manually annotated paraphrase datasets for Russian are limited to small-sized ParaPhraser corpus and ParaPlag which are suitable for a set of NLP tasks, such as paraphrase and plagiarism detection, sentence similarity and relatedness estimation, etc. Due to size restrictions, these datasets can hardly be applied in end-to-end text generation solutions. Meanwhile, paraphrase generation requires a large amount of training data. In our study we propose a solution to the problem: we collect, rank and evaluate a new publicly available headline paraphrase corpus (ParaPhraser Plus), and then perform text generation experiments with manual evaluation on automatically ranked corpora using the Universal Transformer architecture.

show abstract

“…These have included rule-based approaches (McKeown, 1979;Meteer and Shaked, 1988) and data-driven methods (Madnani and Dorr, 2010), with recently the most common approach being that the task is treated as a language translation task (Bannard and Callison-Burch, 2005;Barzilay and McKeown, 2001;Pang et al, 2003) -often performed using a bilingual corpus pivoting back and forth (Madnani and Dorr, 2010;Prakash et al, 2016;Mallinson et al, 2017). Other methods proposed include more recently the use of Deep Reinforcement Learning (Li et al, 2018) , supervised learning using sequence-to-sequence models (Gupta et al, 2018;Prakash et al, 2016) and unsupervised approaches (Bowman et al, 2016;Roy and Grangier, 2019).…”

Section: Related Workmentioning

confidence: 99%

Paraphrasing with Large Language Models

Witteveen¹,

Andrews²

2019

Proceedings of the 3rd Workshop on Neural Generation and Translation

View full text Add to dashboard Cite

Recently, large language models such as GPT-2 have shown themselves to be extremely adept at text generation and have also been able to achieve high-quality results in many downstream NLP tasks such as text classification, sentiment analysis and question answering with the aid of fine-tuning. We present a useful technique for using a large language model to perform the task of paraphrasing on a variety of texts and subjects. Our approach is demonstrated to be capable of generating paraphrases not only at a sentence level but also for longer spans of text such as paragraphs without needing to break the text into smaller chunks.

show abstract

Unsupervised Paraphrasing without Translation

Cited by 45 publications

References 29 publications

AutoQA: From Databases To QA Semantic Parsers With Only Synthetic Training Data

AutoQA: From Databases To QA Semantic Parsers With Only Synthetic Training Data

Automatically Ranked Russian Paraphrase Corpus for Text Generation

Paraphrasing with Large Language Models

Contact Info

Product

Resources

About