The noetic end-to-end response selection challenge as one track in the 7th Dialog System Technology Challenges (DSTC7) aims to push the state of the art of utterance classification for real world goal-oriented dialog systems, for which participants need to select the correct next utterances from a set of candidates for the multi-turn context. This paper presents our systems that are ranked top 1 on both datasets under this challenge, one focused and small (Advising) and the other more diverse and large (Ubuntu). Previous state-of-the-art models use hierarchybased (utterance-level and token-level) neural networks to explicitly model the interactions among different turns' utterances for context modeling. In this paper, we investigate a sequential matching model based only on chain sequence for multi-turn response selection. Our results demonstrate that the potentials of sequential matching approaches have not yet been fully exploited in the past for multi-turn response selection. In addition to ranking top 1 in the challenge, the proposed model outperforms all previous models, including state-of-the-art hierarchy-based models, on two large-scale public multi-turn response selection benchmark datasets.
Keywords:DSTC7, response selection, ESIM, BERT, end-to-end, sequential matching approaches 1. We develop an Enhanced Sequential Inference Model (ESIM) based system for the DSTC7 noetic end-to-end response selection track. On top of the ESIM model, we explore methods for exploiting multiple word embeddings, heuristic data augmentation, tuning the ratio between positive and negative samples, and emphasizing the importance of the most recent context utterances. 2. We propose a two-step approach for selecting the next utterance from a large amount of candidates (i.e., for subtask 2 on the Ubuntu dataset, we need to select the next utterance from a candidate pool of 120,000 sentences), by first using a sentence-encoding based method to select the top N candidates from the large set of candidates and then reranking them using ESIM, achieving a high performance with an acceptable overall computational cost. 3. We conduct systematic ablation analysis of the above-mentioned methods for enhancing the ESIM model performance. In particular, we develop effective and efficient model ensemble by averaging the output from models
MLP based front-ends have shown significant complementary properties to conventional spectral features. As part of the DARPA GALE program, different MLP features were developed for Mandarin ASR. In this paper, all the proposed frontends are compared in systematic manner and we extensively investigate the scalability of these features in terms of the amount of training data (from 100 hours to 1600 hours) and system complexity (maximum likelihood training, SAT, lattice level combination, and discriminative training). Results on 5 hours of evaluation data from the GALE project reveal that the MLP features consistently produce relative improvements in the range of 15% − 23% at the different steps of a multipass system when compared to the conventional short-term spectral based features like MFCC and PLP. The largest improvement is obtained using a hierarchical MLP approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.