“…These models depend on delexicalization, using generic tags to replace specific slot types and values, and handcrafted semantic dictionaries. In practice, it is difficult to scale these models for every slot type and recent state-of-the-art models for DST use deep learning based methods to learn general representations for user and system utterances and previous system actions, and predict the turn state (Henderson et al, 2013(Henderson et al, , 2014bMrkšić et al, 2015Hori et al, 2016;Liu and Lane, 2017;Dernoncourt et al, 2017;Chen et al, 2016). However, these systems are found to perform poorly on rare and unknown slot-value pairs which was recently addressed through local slot-specific encoders (Zhong et al, 2018) and pointer network (Xu and Hu, 2018).…”