2023
DOI: 10.1109/taslp.2022.3225658
|View full text |Cite
|
Sign up to set email alerts
|

Audio Embedding-Aware Dialogue Policy Learning

Abstract: Following the success of Natural Language Processing (NLP) transformers pretrained via self-supervised learning, similar models have been proposed recently for speech processing such as Wav2Vec2, HuBERT and UniSpeech-SAT. An interesting yet unexplored area of application of these models is Spoken Dialogue Systems, where the users' audio signals are typically just mapped to word-level features derived from an Automatic Speech Recogniser (ASR), and then processed using NLP techniques to generate system responses… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 62 publications
(67 reference statements)
0
2
0
Order By: Relevance
“…However, the amount of data needed to build such models and the little or no control of their behaviour once they have been built make them infeasible to implement for many practical applications. Even though deep-learning-based chatbots can be built with rather a few amount of domain-specific data for goal-oriented tasks [ 17 , 18 , 19 ], the problem of the lack of control of the model still remains. This often causes a lack of robustness which can hardly be avoided, and that is the reason why health-related data-driven DMs are extremely scarce in the literature.…”
Section: Related Workmentioning
confidence: 99%
“…However, the amount of data needed to build such models and the little or no control of their behaviour once they have been built make them infeasible to implement for many practical applications. Even though deep-learning-based chatbots can be built with rather a few amount of domain-specific data for goal-oriented tasks [ 17 , 18 , 19 ], the problem of the lack of control of the model still remains. This often causes a lack of robustness which can hardly be avoided, and that is the reason why health-related data-driven DMs are extremely scarce in the literature.…”
Section: Related Workmentioning
confidence: 99%
“…These models bring a new dimension of understanding and contextuality to conversations, not only in text but also in audio interactions, opening doors to even more sophisticated and dynamic interactions between humans and machines. For example, several recent studies have explored the utilisation of Large Audio Models in SDSs for task-oriented dialogue policy learning [212] and also for non-task-oriented dialogue including SpeechGPT [91], SoundStorm [114], AudioGPT [143], and dGSLM [213]. Other recent studies such as ANGIE [214], Multimodal-GPT [215], and Large Multimodal Models [216] have integrated vision and LLMs for training multimodal dialogue systems-which will be potentially transferable efforts to LLM-based robot dialogue systems.…”
Section: Spoken Dialogue Systemsmentioning
confidence: 99%