The goal of active aging is to promote changes in the elderly community so as to maintain an active, independent and socially-engaged lifestyle. Technological advancements currently provide the necessary tools to foster and monitor such processes. This paper reports on mid-term achievements of the European H2020 EMPATHIC project, which aims to research, innovate, explore and validate new interaction paradigms and platforms for future generations of personalized virtual coaches to assist the elderly and their carers to reach the active aging goal, in the vicinity of their home. The project focuses on evidence-based, user-validated research and integration of intelligent technology, and context sensing methods through automatic voice, eye and facial analysis, integrated with visual and spoken dialogue system capabilities. In this paper, we describe the current status of the system, with a special emphasis on its components and their integration, the creation of a Wizard of Oz platform, and findings gained from user interaction studies conducted throughout the first 18 months of the project. CCS Concepts: • Human-centered computing → Interactive systems and tools; Empirical studies in HCI; Interaction techniques; • Applied computing → Health informatics.
The class of K-Testable Languages in the Strict Sense (K-TLSS) is a subclass of regular languages. Previous works demonstrate that stochastic K-TLSS language models describe the same probability distribution as N-gram models, and that smoothing techniques can be efficiently applied (Back-off like methods). Once we have a set of k-TLSS models (k = 1. . . K) and a smoothing technique that specifically fits in them, here we propose an integration into a unique self-contained model (the K-TLSS( S)) which embeds the smoothing within the topology allowing extremely simple parsing procedures. To build this model we designed a more general syntactic mechanism that we call Stochastic Deterministic Finite State Automaton with Recursive Transitions. The topology of the new models (K-TLSS(S)) allows an easy pruning procedure. Pruned K-TLSS(S) models give probability distributions that are equivalent to Variable length N-gram models. Experimental results gave as a conclusion that the effect of a small pruning is always positive.
Neural transformer architectures have gained a lot of interest for text-based dialogue management in the last few years. They have shown high learning capabilities for open domain dialogue with huge amounts of data and also for domain adaptation in task-oriented setups. But the potential benefits of exploiting the users' audio signal have rarely been explored in such frameworks. In this work, we combine text dialogue history representations generated by a GPT-2 model with audio embeddings obtained by the recently released Wav2Vec2 transformer model. We jointly fine-tune these models to learn dialogue policies via supervised learning and two policy gradient-based reinforcement learning algorithms. Our experimental results, using the DSTC2 dataset and a simulated user model capable of sampling audio turns, reveal that audio embeddings lead to overall higher task success (than without using audio embeddings) with statistically significant results across evaluation metrics and training algorithms.
<p align="justify">La identidad que la cultura atribuye a las personas con discapacidad está sustentada en narrativas de exclusión que se traducen en modelos discursivos de discriminación y abuso de poder, representados en normas y estereotipos excluyentes que niegan sus capacidades y atentan contra su reconocimiento para participar en condiciones de igualdad en la vida social y política. En este sentido, el análisis de las narrativas excluyentes permite develar dichas prácticas discriminatorias. El documento se encuentra estructurado en tres partes: 1) "La discapacidad, una construcción social traducida en normas y estereotipos excluyentes"; 2) modelos discursivos de la discapacidad, que parte de la pregunta ¿cuáles son los discursos que han intentado definir a la discapacidad?, y 3) la escuela como escenario para la transformación de narrativas excluyentes en narrativas de la diversidad y el respeto por los derechos humanos. El artículo se propone analizar la categoría <em>discapacidad</em> desde un marco discursivo de las narrativas de la exclusión, en la construcción de las identidades de las personas con discapacidad.</p>
The main goal of this work is to carry out automatic emotion detection from speech by using both acoustic and textual information. For doing that a set of audios were extracted from a TV show were different guests discuss about topics of current interest. The selected audios were transcribed and annotated in terms of emotional status using a crowdsourcing platform. A 3-dimensional model was used to define an specific emotional status in order to pick up the nuances in what the speaker is expressing instead of being restricted to a predefined set of discrete categories. Different sets of acoustic parameters were considered to obtain the input vectors for a neural network. To represent each sequence of words, a models based on word embeddings was used. Different deep learning architectures were tested providing promising results, although having a corpus of a limited size.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.