Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021
DOI: 10.18653/v1/2021.emnlp-main.199
|View full text |Cite
|
Sign up to set email alerts
|

ConRPG: Paraphrase Generation using Contexts as Regularizer

Abstract: A long-standing issue with paraphrase generation is how to obtain reliable supervision signals. In this paper, we propose an unsupervised paradigm for paraphrase generation based on the assumption that the probabilities of generating two sentences with the same meaning given the same context should be the same. Inspired by this fundamental idea, we propose a pipelined system which consists of paraphrase candidate generation based on contextual language models, candidate filtering using scoring functions, and p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(12 citation statements)
references
References 38 publications
(24 reference statements)
0
10
0
Order By: Relevance
“…Syntax-controlled paraphrase generation has seen significant recent interest, as a means to explicitly generate diverse surface forms with the same meaning. However, most previous work has required knowledge of the correct or valid surface forms to be generated (Iyyer et al, 2018;Chen et al, 2019a;Kumar et al, 2020;Meng et al, 2021). It is generally assumed that the input can be rewritten without addressing the problem of predicting which template should be used, which is necessary if the method is to be useful.…”
Section: Syntax-controlled Paraphrase Generationmentioning
confidence: 99%
See 1 more Smart Citation
“…Syntax-controlled paraphrase generation has seen significant recent interest, as a means to explicitly generate diverse surface forms with the same meaning. However, most previous work has required knowledge of the correct or valid surface forms to be generated (Iyyer et al, 2018;Chen et al, 2019a;Kumar et al, 2020;Meng et al, 2021). It is generally assumed that the input can be rewritten without addressing the problem of predicting which template should be used, which is necessary if the method is to be useful.…”
Section: Syntax-controlled Paraphrase Generationmentioning
confidence: 99%
“…While autoregressive models of language (including paraphrasing systems) predict one token at a time, there is evidence that in humans some degree of planning occurs at a higher level than individual words (Levelt, 1993;Martin et al, 2010). Prior work on paraphrase generation has attempted to include this inductive bias by specifying an alternative surface form as additional model input, either in the form of target parse trees (Iyyer et al, 2018;Chen et al, 2019a;Kumar et al, 2020), exemplars (Meng et al, 2021), or syntactic codes…”
Section: Introductionmentioning
confidence: 99%
“…Similarly, CIP emphasizes rephrasing the idioms of input sentences to word segments that reflect more intuitive and understandable paraphrasing. In recent decades, many researchers devoted to paraphrase generation [8], [9] are struggled due to the lack of reliable supervision dataset [10]. Inspired by the challenge, we establish a large-scale training dataset in this work for CIP task.…”
Section: Tablementioning
confidence: 99%
“…A long-standing issue embraced in paraphrase generation studies is the lack of reliable supervised datasets. The issue can be avoided by constructing manually annotated paired-paraphrase datasets [6] or designing unsupervised paraphrase generation methods [10]. Differ from existing paraphrase generation research, we take our attention to Chinese idiom paraphrasing that rephrases idiom-included sentences to non-idiom-included ones.…”
Section: Related Workmentioning
confidence: 99%
“…From another perspective that is not directly related to our work, lexical overlap features are also beneficial to paraphrase generation task. While the quality of generated paraphrases can be decided by state-of-the-art models like Sentence-BERT (Reimers and Gurevych 2019) shown in a recent work (Corbeil and Abdi Ghavidel 2021) for data augmentation, some works still consider lexical overlap features as criteria: Nighojkar and Licato (2021) use BLEURT (Sellam, Das, and Parikh 2020) metric to calculate reward for sentence pairs that are mutually implicative but lexically and syntactically disparate; Kadotani et al (2021) use edit distance to decide whether source and target sentences require drastic transformation, so that the training order of curriculum learning (Bengio et al 2009) can be determined for better performance of paraphrase generation; Jaccard distance is used in Meng et al (2021)’s work as one metric for filtering generated paraphrase candidates.
Figure 3. PAN.
Figure 4. PAWS-wiki.
…”
Section: Related Workmentioning
confidence: 99%