Toward Diverse Text Generation with Inverse Reinforcement Learning

Shi, Zhan; Chen, Xinchi; Qiu, Xipeng; Huang, Xuanjing

doi:10.24963/ijcai.2018/606

Cited by 75 publications

(62 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This may be the reason that the countermeasures could easily detect fake reviews. To generate more robust reviews, we plan to develop a method that generates reviews with more diversity [35]. We also plan to develop a countermeasure for detecting these generated reviews.…”

Section: Discussionmentioning

confidence: 99%

Generating Sentiment-Preserving Fake Online Reviews Using Neural Language Models and Their Human- and Machine-Based Detection

Adelani¹,

Mai

Fang

et al. 2020

Advances in Intelligent Systems and Computing

View full text Add to dashboard Cite

Advanced neural language models (NLMs) are widely used in sequence generation tasks because they are able to produce fluent and meaningful sentences. They can also be used to generate fake reviews, which can then be used to attack online review systems and influence the buying decisions of online shoppers. A problem in fake review generation is how to generate the desired sentiment/topic. Existing solutions first generate an initial review based on some keywords and then modify some of the words in the initial review so that the review has the desired sentiment/topic. We overcome this problem by using the GPT-2 NLM to generate a large number of high-quality reviews based on a review with the desired sentiment and then using a BERT based text classifier (with accuracy of 96%) to filter out reviews with undesired sentiments. Because none of the words in the review are modified, fluent samples like the training data can be generated from the learned distribution. A subjective evaluation with 80 participants demonstrated that this simple method can produce reviews that are as fluent as those written by people. It also showed that the participants tended to distinguish fake reviews randomly. Two countermeasures, GROVER and GLTR, were found to be able to accurately detect fake review.

show abstract

Section: Discussionmentioning

confidence: 99%

Generating Sentiment-Preserving Fake Online Reviews Using Neural Language Models and Their Human- and Machine-Based Detection

Adelani¹,

Mai

Fang

et al. 2020

Advances in Intelligent Systems and Computing

View full text Add to dashboard Cite

show abstract

“…As for the discriminator, RankGAN (Lin et al, 2017) replaced traditional discriminator with a ranker to learn the relative ranking information between the real texts and generated ones. Inverse reinforcement learning (Shi et al, 2018) used a trainable reward approximator as the discriminator to provide dense reward signals at each generation step. DPGAN ) introduced a language model based discriminator and regarded cross-entropy as rewards to promote the diversity of generation results.…”

Section: Related Workmentioning

confidence: 99%

“…MaliGAN: A variant of SeqGAN that optimizes the generator with a normalized maximum likelihood objective (Che et al, 2017). IRL: This inverse reinforcement learning method replaces the discriminator with a reward approximator to provide dense rewards (Shi et al, 2018). RAML: A RL approach to incorporate MLE objective into RL training framework, which regards BLEU as rewards (Norouzi et al, 2016).…”

Section: Baselinesmentioning

confidence: 99%

“…Although widely used, MLE suffers from the exposure bias problem (Bengio et al, 2015;Ranzato et al, 2016): during test, the model sequentially predicts the next word conditioned on its previous generated words while during training conditioned on ground-truth words. To tackle this problem, generative adversarial networks (GAN) with reinforcement learning (RL) training approaches have been introduced to text generation tasks Che et al, 2017;Lin et al, 2017;Shi et al, 2018;, where the discriminator is trained to distinguish real and generated text samples to provide reward signals for the generator, and the generator is optimized via policy gradient .…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

ARAML: A Stable Adversarial Training Framework for Text Generation

Pei¹,

Huang²,

Huang³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Most of the existing generative adversarial networks (GAN) for text generation suffer from the instability of reinforcement learning training algorithms such as policy gradient, leading to unstable performance. To tackle this problem, we propose a novel framework called Adversarial Reward Augmented Maximum Likelihood (ARAML). During adversarial training, the discriminator assigns rewards to samples which are acquired from a stationary distribution near the data rather than the generator's distribution. The generator is optimized with maximum likelihood estimation augmented by the discriminator's rewards instead of policy gradient. Experiments show that our model can outperform state-of-the-art text GANs with a more stable training process.

show abstract

“…This method enables us to both make use of an efficient adversarial formulation and recover a more precise reward function for open-domain dialogue training. Unlike Shi et al (2018), we design a specific reward function structure to measure the reward of each word in generated sentences while taking account of the dialogue context. We also consider two human evaluation settings to assess the overall performance of our model.…”

Section: Introductionmentioning

confidence: 99%

Dialogue Generation: From Imitation Learning to Inverse Reinforcement Learning

Kiseleva

Rijke

2019

AAAI

View full text Add to dashboard Cite

The performance of adversarial dialogue generation models relies on the quality of the reward signal produced by the discriminator. The reward signal from a poor discriminator can be very sparse and unstable, which may lead the generator to fall into a local optimum or to produce nonsense replies. To alleviate the first problem, we first extend a recently proposed adversarial dialogue generation method to an adversarial imitation learning solution. Then, in the framework of adversarial inverse reinforcement learning, we propose a new reward model for dialogue generation that can provide a more accurate and precise reward signal for generator training. We evaluate the performance of the resulting model with automatic metrics and human evaluations in two annotation settings. Our experimental results demonstrate that our model can generate more high-quality responses and achieve higher overall performance than the state-of-the-art.

show abstract

Toward Diverse Text Generation with Inverse Reinforcement Learning

Cited by 75 publications

References 0 publications

Generating Sentiment-Preserving Fake Online Reviews Using Neural Language Models and Their Human- and Machine-Based Detection

Generating Sentiment-Preserving Fake Online Reviews Using Neural Language Models and Their Human- and Machine-Based Detection

ARAML: A Stable Adversarial Training Framework for Text Generation

Dialogue Generation: From Imitation Learning to Inverse Reinforcement Learning

Contact Info

Product

Resources

About