Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-1826
|View full text |Cite
|
Sign up to set email alerts
|

End-to-End Spoken Language Understanding for Generalized Voice Assistants

Abstract: End-to-end (E2E) spoken language understanding (SLU) systems predict utterance semantics directly from speech using a single model. Previous work in this area has focused on targeted tasks in fixed domains, where the output semantic structure is assumed a priori and the input speech is of limited complexity. In this work we present our approach to developing an E2E model for generalized SLU in commercial voice assistants (VAs). We propose a fully differentiable, transformer-based, hierarchical system that can … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 34 publications
(67 reference statements)
0
3
0
Order By: Relevance
“…Simple word-and n-gram level approaches have proven surprisingly capable in a-priori characterizations of dataset difficulty (McKenna et al, 2020) and producing difficult test sets (Saxon et al, 2021) in diverse language domains such as SLU. Gardner et al (2021) show how such purely frequentist approaches can identify word-level spurious correlations with respect to label class which drive in-part the shortcut features for classes of "competency problems" such as NLI.…”
Section: Related Workmentioning
confidence: 99%
“…Simple word-and n-gram level approaches have proven surprisingly capable in a-priori characterizations of dataset difficulty (McKenna et al, 2020) and producing difficult test sets (Saxon et al, 2021) in diverse language domains such as SLU. Gardner et al (2021) show how such purely frequentist approaches can identify word-level spurious correlations with respect to label class which drive in-part the shortcut features for classes of "competency problems" such as NLI.…”
Section: Related Workmentioning
confidence: 99%
“…Table 4: Accuracy on FSC dataset. Alexa 0.987 FSC-baseline [29] 0.988 Cao et al [33] 0.990 FANS [34] 0.990 Reptile [35] 0.992 Finstreder (Quartznet) 0.992 Saxon et al [36] 0.994 AT-AT [26] 0.995 Finstreder (Conformer) 0.995 Borgholt et al [37] 0.996 Seo et al [38] 0.997 Qian et al [39] 0.997 Kim et al [32] 0.997 Finstreder (Quartznet) + AMT 0.997…”
Section: English Frenchmentioning
confidence: 99%
“…With the recent advances of neural network, there is growing popularity of designing SLU systems in the end-to-end (E2E) fashion [4,5,6], where the ASR and NLU compo-nents are integrated into a single network and optimised with a joint loss function. The E2E SLU systems generally adopt the encoder-decoder-based sequence-to-sequence (Seq2Seq) framework, which has been widely employed in several areas including neural machine translation [7,8] and ASR [9,10].…”
Section: Introductionmentioning
confidence: 99%