ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414566
|View full text |Cite
|
Sign up to set email alerts
|

DO as I Mean, Not as I Say: Sequence Loss Training for Spoken Language Understanding

Abstract: Spoken language understanding (SLU) systems extract transcriptions, as well as semantics of intent or named entities from speech, and are essential components of voice activated systems. SLU models, which either directly extract semantics from audio or are composed of pipelined automatic speech recognition (ASR) and natural language understanding (NLU) models, are typically trained via differentiable cross-entropy losses, even when the relevant performance metrics of interest are word or semantic error rates. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(2 citation statements)
references
References 24 publications
0
2
0
Order By: Relevance
“…RL is generally used for auxiliary tasks, such as ensuring the output is properly formatted (Zhong, Xiong, and Socher 2017). In SLU, Wang et al (2018a) applied RL to learn the wrong labeled slots with or without user's feedback; Rao et al (2021) proposed a reinforcement framework to improve automatic speech recognition robustness. In this work, we apply reinforcement learning to alleviate the misalignment between the correct predictions of the two subtasks in multi-intent SLU.…”
Section: Related Workmentioning
confidence: 99%
“…RL is generally used for auxiliary tasks, such as ensuring the output is properly formatted (Zhong, Xiong, and Socher 2017). In SLU, Wang et al (2018a) applied RL to learn the wrong labeled slots with or without user's feedback; Rao et al (2021) proposed a reinforcement framework to improve automatic speech recognition robustness. In this work, we apply reinforcement learning to alleviate the misalignment between the correct predictions of the two subtasks in multi-intent SLU.…”
Section: Related Workmentioning
confidence: 99%
“…There are a handful of attempts in literature for applying FL in speech-related tasks. Some of these applications are: ASR [10,11,12,13,14], Keyword Spotting [15,16], Emotion Recognition [17,18,16], and Speaker Verification [19]. Notably, for combining FL with SSL, the only available works include Federated self-supervised learning (FSSL) [20] for acoustic event detection and [21], where the challenges involved in combining FL & SSL due to hardware limitations on the client are highlighted and a wav2vec 2.0 [4] model is trained with FL on Common-Voice Italian data [22] and fine-tuned for ASR.…”
Section: Related Workmentioning
confidence: 99%