Audio Embeddings Help to Learn Better Dialogue Policies

Zorrilla, Asier López; Torres, M. Inés; Cuayáhuitl, Heriberto

doi:10.1109/asru51503.2021.9688296

Cited by 2 publications

(5 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Last, this study presents several novelties compared to our preliminary work [5]. We provide a larger experimentation and a much deeper analysis of how, when and why speech representations help to learn better dialogue policies.…”

Section: Related Workmentioning

confidence: 99%

“…User Audio Sampler. Since audio signals need to be fed to the proposed dialogue policies, we employ the User Audio Sampler proposed in [5] to sample an audio turn from the corpus. First, it selects output candidates filtering the audios of dialogue turns labelled with the same dialogue acts and slots generated by the UM.…”

Section: B Dialogue Pipeline For Simulationsmentioning

confidence: 99%

“…3) RL details: REINFORCE and Actor-Critic are policy gradient RL algorithms that learn a set of weights θ in order select action a in state s according to policy π θ (a|s). Our reward function use dense rewards, since previous works suggest that stronger policies are learnt in comparison with sparse rewards [5]. The reward function is as follows: where score is the evaluation score described in section IV-C, which provides intermediate rewards after every system turnespecially if there are any new components in any of the aforementioned task completion metrics.…”

Section: Experiments Overviewmentioning

confidence: 99%

“…available [2]- [4]. Preliminary experiments by [5] performed on the DSTC2 [6] spoken dialogue corpus indicate that, audio embeddings might actually encode useful acoustic information and exploit it to learn better dialogue policies, when combined with GPT-2 transformer [7] based neural dialogue policies.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Audio Embedding-Aware Dialogue Policy Learning

Zorrilla

Torres

Cuayáhuitl

2023

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

Following the success of Natural Language Processing (NLP) transformers pretrained via self-supervised learning, similar models have been proposed recently for speech processing such as Wav2Vec2, HuBERT and UniSpeech-SAT. An interesting yet unexplored area of application of these models is Spoken Dialogue Systems, where the users' audio signals are typically just mapped to word-level features derived from an Automatic Speech Recogniser (ASR), and then processed using NLP techniques to generate system responses. This paper reports a comprehensive comparison of dialogue policies trained using ASR-based transcriptions and extended with the aforementioned audio processing transformers in the DSTC2 task. Whilst our dialogue policies are trained with supervised and policy-based deep reinforcement learning, they are assessed using both automatic task completion metrics and a human evaluation. Our results reveal that using audio embeddings is more beneficial than detrimental in most of our trained dialogue policies, and that the benefits are stronger for supervised learning than reinforcement learning.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: B Dialogue Pipeline For Simulationsmentioning

confidence: 99%

Section: Experiments Overviewmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Audio Embedding-Aware Dialogue Policy Learning

Zorrilla

Torres

Cuayáhuitl

2023

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

show abstract

“…DRL agents are often trained from scratch instead of inheriting useful behaviours from other agents. Some agents from Table 4 [such as (Williams and Zweig 2016;Liu et al 2017;Zorrilla et al 2021)] have avoided learning from scratch by showing that applying DRL on top of non-DRL or supervised methods yields improved performance due to the optimisation element that DRL brings instead of only mimicking demonstration data. But those systems typically focus a single dataset and the idea of transferring useful and effective knowledge from other/many tasks to a new or targeted task remains to be demonstrated.…”

Section: Knowledge Transfer and Generalisationmentioning

confidence: 99%

A survey on deep reinforcement learning for audio-based applications

Latif

Cuayáhuitl

Pervez³

et al. 2022

Artif Intell Rev

View full text Add to dashboard Cite

Deep reinforcement learning (DRL) is poised to revolutionise the field of artificial intelligence (AI) by endowing autonomous systems with high levels of understanding of the real world. Currently, deep learning (DL) is enabling DRL to effectively solve various intractable problems in various fields including computer vision, natural language processing, healthcare, robotics, to name a few. Most importantly, DRL algorithms are also being employed in audio signal processing to learn directly from speech, music and other sound signals in order to create audio-based autonomous systems that have many promising applications in the real world. In this article, we conduct a comprehensive survey on the progress of DRL in the audio domain by bringing together research studies across different but related areas in speech and music. We begin with an introduction to the general field of DL and reinforcement learning (RL), then progress to the main DRL methods and their applications in the audio domain. We conclude by presenting important challenges faced by audio-based DRL agents and by highlighting open areas for future research and investigation. The findings of this paper will guide researchers interested in DRL for the audio domain.

show abstract

Audio Embeddings Help to Learn Better Dialogue Policies

Cited by 2 publications

References 18 publications

Audio Embedding-Aware Dialogue Policy Learning

Audio Embedding-Aware Dialogue Policy Learning

A survey on deep reinforcement learning for audio-based applications

Contact Info

Product

Resources

About