“…Although widely used, MLE suffers from the exposure bias problem (Bengio et al, 2015;Ranzato et al, 2016): during test, the model sequentially predicts the next word conditioned on its previous generated words while during training conditioned on ground-truth words. To tackle this problem, generative adversarial networks (GAN) with reinforcement learning (RL) training approaches have been introduced to text generation tasks Che et al, 2017;Lin et al, 2017;Shi et al, 2018;, where the discriminator is trained to distinguish real and generated text samples to provide reward signals for the generator, and the generator is optimized via policy gradient .…”