“…By reconstructing the original dialogues with discrete latent variable models, we can extract a structure representing the transition among the variables. In this direction, people have tried Hidden Markov Models (Chotimongkol, 2008;Ritter et al, 2010;Zhai and Williams, 2014), Variational Auto-Encoders (VAEs) (Kingma and Welling, 2013), and its recurrent version Variational Recurrent Neural Networks (VRNNs) (Chung et al, 2015;Shi et al, 2019;Qiu et al, 2020) Cann et al, 2017;Howard and Ruder, 2018;Peters et al, 2018;Devlin et al, 2019), the Transformer architecture (Vaswani et al, 2017) can be trained on generic corpora and adapted to specific downstream tasks. In dialogue systems, Wu et al (2020) pre-trained the BERT model (Devlin et al, 2019) on task-oriented dialogues for intent recognition, dialogue state tracking, dialogue act prediction, and response selection.…”