“…Most of the previous work develops personalized (Zhang et al, 2018;Zheng et al, 2020;Song et al, 2021;Chen et al, 2023a), emotional (Ghosal et al, 2020;Zheng et al, 2023a;Deng et al, 2023c;Zheng et al, 2023b), empathetic (Rashkin et al, 2019Sabour et al, 2022) dialogue system in isolation, rather than seamlessly blending them all into one cohesive conversational flow (Smith et al, 2020;. A common approach is to predict the emotion or persona from a pre-defined set and generate the response in a multi-task manner (Ma et al, 2021;Sabour et al, 2022;. Besides that, lots of work notices these linguistic cues underneath text by directly predicting them independently as a classification task Barriere et al, 2022;Ghosh et al, 2022).…”