DO as I Mean, Not as I Say: Sequence Loss Training for Spoken Language Understanding

Rao, Milind; Dheram, Pranav; Tiwari, Gautam; Raju, Anirudh; Droppo, Jasha; Rastrow, Ariya; Stolcke, Andreas

doi:10.1109/icassp39728.2021.9414566

Cited by 8 publications

(2 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…RL is generally used for auxiliary tasks, such as ensuring the output is properly formatted (Zhong, Xiong, and Socher 2017). In SLU, Wang et al (2018a) applied RL to learn the wrong labeled slots with or without user's feedback; Rao et al (2021) proposed a reinforcement framework to improve automatic speech recognition robustness. In this work, we apply reinforcement learning to alleviate the misalignment between the correct predictions of the two subtasks in multi-intent SLU.…”

Section: Related Workmentioning

confidence: 99%

Aligner²: Enhancing Joint Multiple Intent Detection and Slot Filling via Adjustive and Forced Cross-Task Alignment

Zhu,

Cheng,

et al. 2024

AAAI

View full text Add to dashboard Cite

Multi-intent spoken language understanding (SLU) has garnered growing attention due to its ability to handle multiple intent utterances, which closely mirrors practical scenarios. Unlike traditional SLU, each intent in multi-intent SLU corresponds to its designated scope for slots, which occurs in certain fragments within the utterance. As a result, establishing precise scope alignment to mitigate noise impact emerges as a key challenge in multi-intent SLU. More seriously, they lack alignment between the predictions of the two sub-tasks due to task-independent decoding, resulting in a limitation on the overall performance. To address these challenges, we propose a novel framework termed Aligner² for multi-intent SLU, which contains an Adjustive Cross-task Aligner (ACA) and a Forced Cross-task Aligner (FCA). ACA utilizes the information conveyed by joint label embeddings to accurately align the scope of intent and corresponding slots, before the interaction of the two subtasks. FCA introduces reinforcement learning, to enforce the alignment of the task-specific hidden states after the interaction, which is explicitly guided by the prediction. Extensive experiments on two public multi-intent SLU datasets demonstrate the superiority of our Aligner² over state-of-the-art methods. More encouragingly, the proposed method Aligner² can be easily integrated into existing multi-intent SLU frameworks, to further boost performance.

show abstract

Section: Related Workmentioning

confidence: 99%

Aligner²: Enhancing Joint Multiple Intent Detection and Slot Filling via Adjustive and Forced Cross-Task Alignment

Zhu,

Cheng,

et al. 2024

AAAI

View full text Add to dashboard Cite

show abstract

“…There are a handful of attempts in literature for applying FL in speech-related tasks. Some of these applications are: ASR [10,11,12,13,14], Keyword Spotting [15,16], Emotion Recognition [17,18,16], and Speaker Verification [19]. Notably, for combining FL with SSL, the only available works include Federated self-supervised learning (FSSL) [20] for acoustic event detection and [21], where the challenges involved in combining FL & SSL due to hardware limitations on the client are highlighted and a wav2vec 2.0 [4] model is trained with FL on Common-Voice Italian data [22] and fine-tuned for ASR.…”

Section: Related Workmentioning

confidence: 99%

Federated Representation Learning for Automatic Speech Recognition

Ramesh,

Chennupati,

Rao

et al. 2023

3rd Symposium on Security and Privacy in Speech Communication

View full text Add to dashboard Cite

Federated Learning (FL) is a privacy-preserving paradigm, allowing edge devices to learn collaboratively without sharing data. Edge devices like Alexa and Siri are prospective sources of unlabeled audio data that can be tapped to learn robust audio representations. In this work, we bring Self-supervised Learning (SSL) and FL together to learn representations for Automatic Speech Recognition respecting data privacy constraints. We use the speaker and chapter information in the unlabeled speech dataset, Libri-Light, to simulate non-IID speaker-siloed data distributions and pre-train an LSTM encoder with the Contrastive Predictive Coding framework with FedSGD. We show that the pre-trained ASR encoder in FL performs as well as a centrally pre-trained model and produces an improvement of 12-15% (WER) compared to no pre-training. We further adapt the federated pre-trained models to a new language, French, and show a 20% (WER) improvement over no pre-training.

show abstract

Attentive Contextual Carryover for Multi-Turn End-to-End Spoken Language Understanding

Wei

Tran

Chang

et al. 2021

2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

View full text Add to dashboard Cite

DO as I Mean, Not as I Say: Sequence Loss Training for Spoken Language Understanding

Cited by 8 publications

References 24 publications

Aligner²: Enhancing Joint Multiple Intent Detection and Slot Filling via Adjustive and Forced Cross-Task Alignment

Aligner²: Enhancing Joint Multiple Intent Detection and Slot Filling via Adjustive and Forced Cross-Task Alignment

Federated Representation Learning for Automatic Speech Recognition

Attentive Contextual Carryover for Multi-Turn End-to-End Spoken Language Understanding

Contact Info

Product

Resources

About