The goal of active aging is to promote changes in the elderly community so as to maintain an active, independent and socially-engaged lifestyle. Technological advancements currently provide the necessary tools to foster and monitor such processes. This paper reports on mid-term achievements of the European H2020 EMPATHIC project, which aims to research, innovate, explore and validate new interaction paradigms and platforms for future generations of personalized virtual coaches to assist the elderly and their carers to reach the active aging goal, in the vicinity of their home. The project focuses on evidence-based, user-validated research and integration of intelligent technology, and context sensing methods through automatic voice, eye and facial analysis, integrated with visual and spoken dialogue system capabilities. In this paper, we describe the current status of the system, with a special emphasis on its components and their integration, the creation of a Wizard of Oz platform, and findings gained from user interaction studies conducted throughout the first 18 months of the project. CCS Concepts: • Human-centered computing → Interactive systems and tools; Empirical studies in HCI; Interaction techniques; • Applied computing → Health informatics.
The class of K-Testable Languages in the Strict Sense (K-TLSS) is a subclass of regular languages. Previous works demonstrate that stochastic K-TLSS language models describe the same probability distribution as N-gram models, and that smoothing techniques can be efficiently applied (Back-off like methods). Once we have a set of k-TLSS models (k = 1. . . K) and a smoothing technique that specifically fits in them, here we propose an integration into a unique self-contained model (the K-TLSS( S)) which embeds the smoothing within the topology allowing extremely simple parsing procedures. To build this model we designed a more general syntactic mechanism that we call Stochastic Deterministic Finite State Automaton with Recursive Transitions. The topology of the new models (K-TLSS(S)) allows an easy pruning procedure. Pruned K-TLSS(S) models give probability distributions that are equivalent to Variable length N-gram models. Experimental results gave as a conclusion that the effect of a small pruning is always positive.
Neural transformer architectures have gained a lot of interest for text-based dialogue management in the last few years. They have shown high learning capabilities for open domain dialogue with huge amounts of data and also for domain adaptation in task-oriented setups. But the potential benefits of exploiting the users' audio signal have rarely been explored in such frameworks. In this work, we combine text dialogue history representations generated by a GPT-2 model with audio embeddings obtained by the recently released Wav2Vec2 transformer model. We jointly fine-tune these models to learn dialogue policies via supervised learning and two policy gradient-based reinforcement learning algorithms. Our experimental results, using the DSTC2 dataset and a simulated user model capable of sampling audio turns, reveal that audio embeddings lead to overall higher task success (than without using audio embeddings) with statistically significant results across evaluation metrics and training algorithms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.