This paper presents a dialogue act taxonomy designed for the developmentof a conversational agent for elderly. The main goal of this conversational agent is to improvelife quality of the user by means of coaching sessions in different topics. In contrast to otherapproaches such as task-oriented dialogue systems and chit-chat implementations, the agent shoulddisplay a pro-active attitude, driving the conversation to reach a number of diverse coachinggoals. Therefore, the main characteristic of the introduced dialogue act taxonomy is its capacityfor supporting a communication based on the GROW model for coaching. In addition, the taxonomyhas a hierarchical structure between the tags and it is multimodal. We use the taxonomy toannotate a Spanish dialogue corpus collected from a group of elder people. We also present apreliminary examination of the annotated corpus and discuss on the multiple possibilities it presentsfor further research.
Neural transformer architectures have gained a lot of interest for text-based dialogue management in the last few years. They have shown high learning capabilities for open domain dialogue with huge amounts of data and also for domain adaptation in task-oriented setups. But the potential benefits of exploiting the users' audio signal have rarely been explored in such frameworks. In this work, we combine text dialogue history representations generated by a GPT-2 model with audio embeddings obtained by the recently released Wav2Vec2 transformer model. We jointly fine-tune these models to learn dialogue policies via supervised learning and two policy gradient-based reinforcement learning algorithms. Our experimental results, using the DSTC2 dataset and a simulated user model capable of sampling audio turns, reveal that audio embeddings lead to overall higher task success (than without using audio embeddings) with statistically significant results across evaluation metrics and training algorithms.
In this work we develop a fully data driven conversational agent capable of carrying out motivational coaching sessions in Spanish, French, Norwegian and English. Unlike the majority of coaching, and in general, well-being related conversational agents that can be found in the literature, ours is not designed by hand-crafted rules. Instead, we directly model the coaching strategy of professionals with end users. To this end, we gather a set of virtual coaching sessions through a Wizard of Oz platform, and apply state of the art Natural Language Processing techniques. We employ a transfer learning approach, pretraining GPT2 neural language models and fine-tuning them on our corpus. However, since these only take as input a local dialogue history, a simple fine-tuning procedure is not capable of modeling the long-term dialogue strategies that appear in coaching sessions. To alleviate this issue, we first propose to learn dialogue phase and scenario embeddings in the fine-tuning stage. These indicate to the model at which part of the dialogue it is and which kind of coaching session it is carrying out. Second, we develop global deep learning system which controls the long-term structure of the dialogue. We also show that this global module can be used to visualize and interpret the decisions taken by the the conversational agent, and that the learnt representations are comparable to dialogue acts. Automatic and human evaluation show that our proposals serve to improve the baseline models. Finally, interaction experiments with coaching experts indicate that system is usable and gives raise to positive emotions in Spanish, French and English, while the results in Norwegian point out that there is still work to be done in fully data driven approaches with very low resource languages.
This work presents a novel methodology to train open domain neural dialogue systems within the framework of Generative Adversarial Networks with gradient based optimization methods. We avoid the non-differentiability related to textgenerating networks approximating the word vector corresponding to each generated token via a top-k softmax. We show that a weighted average of the word vectors of the most probable tokens computed from the probabilities resulting of the top-k softmax leads to a good approximation of the word vector of the generated token. Finally we demonstrate through a human evaluation process that training a neural dialogue system via adversarial learning with this method successfully discourages it from producing generic responses. Instead it tends to produce more informative and variate ones.
Decoding emotional states from multimodal signals is an increasingly active domain, within the framework of affective computing, which aims to a better understanding of Human-Human Communication as well as to improve Human-Computer Interaction. But the automatic recognition of spontaneous emotions from speech is a very complex task due to the lack of a certainty of the speaker states as well as to the difficulty to identify a variety of emotions in real scenarios. In this work we explore the extent to which emotional states can be decoded from speech signals extracted from TV political debates. The labelling procedure was supported by perception experiments where only a small set of emotions has been identified. In addition, some scaled judgements of valence, arousal and dominance were also provided. In this framework the paper shows meaningful comparisons between both, the dimensional and the categorical models of emotions, which is a new contribution when dealing with spontaneous emotions. To this end Support Vector Machines (SVM) as well as Feedforward Neural Networks (FNN) have been proposed to develop classifiers and predictors. The experimental evaluation over a Spanish corpus has shown the ability of both models to be identified in speech segments by the proposed artificial systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.