“…Reinforcement learning has been applied in various natural language generation tasks, including image caption (Rennie et al, 2017), automatic summarization (Paulus et al, 2018), machine translation (Kang et al, 2020) and poem generation (Yang et al, 2019). Specifically, when applying reinforcement learning in dialogue generation (Li et al, 2016;Zhao et al, 2019;Shi et al, 2019;Yamazaki and Aizawa, 2021;, self-play is often used to enable scoring multi-turn dialogues.…”