Audio Embedding-Aware Dialogue Policy Learning

Zorrilla, Asier López; Torres, M. Inés; Cuayáhuitl, Heriberto

doi:10.1109/taslp.2022.3225658

Cited by 3 publications

(2 citation statements)

References 62 publications

(67 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, the amount of data needed to build such models and the little or no control of their behaviour once they have been built make them infeasible to implement for many practical applications. Even though deep-learning-based chatbots can be built with rather a few amount of domain-specific data for goal-oriented tasks [ 17 , 18 , 19 ], the problem of the lack of control of the model still remains. This often causes a lack of robustness which can hardly be avoided, and that is the reason why health-related data-driven DMs are extremely scarce in the literature.…”

Section: Related Workmentioning

confidence: 99%

Dialogue Management and Language Generation for a Robust Conversational Virtual Coach: Validation and User Study

Vázquez

Zorrilla

Olaso

et al. 2023

Sensors

View full text Add to dashboard Cite

Designing human–machine interactive systems requires cooperation between different disciplines is required. In this work, we present a Dialogue Manager and a Language Generator that are the core modules of a Voice-based Spoken Dialogue System (SDS) capable of carrying out challenging, long and complex coaching conversations. We also develop an efficient integration procedure of the whole system that will act as an intelligent and robust Virtual Coach. The coaching task significantly differs from the classical applications of SDSs, resulting in a much higher degree of complexity and difficulty. The Virtual Coach has been successfully tested and validated in a user study with independent elderly, in three different countries with three different languages and cultures: Spain, France and Norway.

show abstract

Section: Related Workmentioning

confidence: 99%

Dialogue Management and Language Generation for a Robust Conversational Virtual Coach: Validation and User Study

Vázquez

Zorrilla

Olaso

et al. 2023

Sensors

View full text Add to dashboard Cite

show abstract

“…These models bring a new dimension of understanding and contextuality to conversations, not only in text but also in audio interactions, opening doors to even more sophisticated and dynamic interactions between humans and machines. For example, several recent studies have explored the utilisation of Large Audio Models in SDSs for task-oriented dialogue policy learning [212] and also for non-task-oriented dialogue including SpeechGPT [91], SoundStorm [114], AudioGPT [143], and dGSLM [213]. Other recent studies such as ANGIE [214], Multimodal-GPT [215], and Large Multimodal Models [216] have integrated vision and LLMs for training multimodal dialogue systems-which will be potentially transferable efforts to LLM-based robot dialogue systems.…”

Section: Spoken Dialogue Systemsmentioning

confidence: 99%

A survey on deep reinforcement learning for audio-based applications

Latif

Cuayáhuitl

Pervez³

et al. 2022

Artif Intell Rev

View full text Add to dashboard Cite

Deep reinforcement learning (DRL) is poised to revolutionise the field of artificial intelligence (AI) by endowing autonomous systems with high levels of understanding of the real world. Currently, deep learning (DL) is enabling DRL to effectively solve various intractable problems in various fields including computer vision, natural language processing, healthcare, robotics, to name a few. Most importantly, DRL algorithms are also being employed in audio signal processing to learn directly from speech, music and other sound signals in order to create audio-based autonomous systems that have many promising applications in the real world. In this article, we conduct a comprehensive survey on the progress of DRL in the audio domain by bringing together research studies across different but related areas in speech and music. We begin with an introduction to the general field of DL and reinforcement learning (RL), then progress to the main DRL methods and their applications in the audio domain. We conclude by presenting important challenges faced by audio-based DRL agents and by highlighting open areas for future research and investigation. The findings of this paper will guide researchers interested in DRL for the audio domain.

show abstract