“…Task-Oriented Dialogue Models: Most taskoriented dialogue systems break down the task into three components: belief tracking (Henderson et al, 2013;Mrkšić et al, 2016;Rastogi et al, 2017;Nouri and Hosseini-Asl, 2018;Wu et al, 2019a;Zhou and Small, 2019;Heck et al, 2020), dialogue act prediction (Wen et al, 2017a;Tanaka et al, 2019), and response generation Budzianowski et al, 2018;Lippe et al, 2020). Traditionally, a modular approach is adopted, where these components are optimized independently (i.e., a pipeline design) or learned via multi-task learning (i.e., some parameters are shared among the components) (Wen et al, 2017b;Zhao et al, 2019;Mehri et al, 2019;Tseng et al, 2020;. However, it is known that improvements in one component do not necessarily lead to overall performance improvements (Ham et al, 2020), and the modular approach suffers from error propagation in practice (Liu and Lane, 2018).…”