This paper presents work on using Bayesian networks for the dialogue act recognition module of a dialogue system for Dutch dialogues. The Bayesian networks can be constructed from the data in an annotated dialogue corpus. For two series of experiments -using different corpora but the same annotation scheme -recognition results are presented and evaluated.
While data-driven methods for spoken language understanding reduce maintenance and portability costs compared with handcrafted parsers, the collection of word-level semantic annotations for training remains a time-consuming task. A recent line of research has focused on building generative models from unaligned semantic representations, using expectation-maximisation techniques to align semantic concepts. This paper presents an efficient, simple technique that parses a semantic tree by recursively calling discriminative semantic classification models. Results show that it outperforms existing generative models, while performance is close to more complex grammar induction techniques. We also show that our method is robust to speech recognition errors, by improving over a handcrafted parser previously used for dialogue data collection.
We describe a variety of machine learning techniques that are being applied to social multi-user human-robot interaction, using a robot bartender in our scenario. We first present a data-driven approach to social state recognition based on supervised learning. We then describe an approach to social skills execution-i.e., action selection for generating socially appropriate robot behaviour-which is based on reinforcement learning, using a data-driven simulation of multiple users to train execution policies for social skills. Next, we describe how these components for social state recognition and skills execution have been integrated into an end-to-end robot bartender system, and we discuss the results of a user evaluation. Finally, we present an alternative unsupervised learning framework that combines social state recognition and social skills execution, based on hierarchical Dirichlet processes and an infinite POMDP interaction manager. The models make use of data from both human-human interactions collected in a number of German bars and human-robot interactions recorded in the evaluation of an initial version of the system.
This paper investigates the claim that a dialogue manager modelled as a Partially Observable Markov Decision Process (POMDP) can achieve improved robustness to noise compared to conventional state-based dialogue managers. Using the Hidden Information State (HIS) POMDP dialogue manager as an exemplar, and an MDP-based dialogue manager as a baseline, evaluation results are presented for both simulated and real dialogues in a Tourist Information Domain. The results on the simulated data show that the inherent ability to model uncertainty, allows the POMDP model to exploit alternative hypotheses from the speech understanding system. The results obtained from a user trial show that the HIS system with a trained policy performed significantly better than the MDP baseline.
HMM based synthesis has attracted great interest due to its compact and flexible modelling of spectral and prosodic parameters. In this approach, short term spectra, fundamental frequency (F0) and duration are simultaneously modelled by multi-stream HMMs. However, since F0 values in unvoiced regions are normally considered as undefined, it is difficult to use standard HMMs for F0 modelling. The currently preferred solution to this is to use a multi-space distribution HMM (MSDHMM) in which discrete distributions are used for modelling the voiced/unvoiced decision and continuous Gaussian distributions are used for modelling the F0 values within the voiced regions. However, the assumption of undefined unvoiced F0 regions and the special structure of the MSDHMM lead to limitations in the accurate modelling of F0 patterns. In this paper an alternative is explored whereby unvoiced F0 values are assumed to exist and are modelled within the standard HMM framework using a globally tied distribution (GTD). Subjective evaluations show that these regular HMMs with GTD can produce significant improvements in the naturalness of the synthesised speech compared to the MSDHMM, and furthermore, the method is insensitive to the exact method used for unvoiced F0 generation.
In this paper we present a comparative evaluation of various negotiation strategies within an online version of the game "Settlers of Catan". The comparison is based on human subjects playing games against artificial game-playing agents ('bots') which implement different negotiation dialogue strategies, using a chat dialogue interface to negotiate trades. Our results suggest that a negotiation strategy that uses persuasion, as well as a strategy that is trained from data using Deep Reinforcement Learning, both lead to an improved win rate against humans, compared to previous rule-based and supervised learning baseline dialogue negotiators.
We present and evaluate a novel approach to natural language generation (NLG) in statistical spoken dialogue systems (SDS) using a data-driven statistical optimization framework for incremental information presentation (IP), where there is a trade-off to be solved between presenting "enough" information to the user while keeping the utterances short and understandable. The trained IP model is adaptive to variation from the current generation context (e.g. a user and a non-deterministic sentence planner), and it incrementally adapts the IP policy at the turn level. Reinforcement learning is used to automatically optimize the IP policy with respect to a data-driven objective function. In a case study on presenting restaurant information, we show that an optimized IP strategy trained on Wizard-of-Oz data outperforms a baseline mimicking the wizard behavior in terms of total reward gained. The policy is then also tested with real users, and improves on a conventional hand-coded IP strategy used in a deployed SDS in terms of overall task success. The evaluation found that the trained IP strategy significantly improves dialogue task completion for real users, with up to a 8.2% increase in task success. This methodology also provides new insights into the nature of the IP problem, which has previously been treated as a module following dialogue management with no access to lower-level context features (e.g. from a surface realizer and/or speech synthesizer).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.