Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-2929
|View full text |Cite
|
Sign up to set email alerts
|

Semantic Complexity in End-to-End Spoken Language Understanding

Abstract: End-to-end spoken language understanding (SLU) models are a class of model architectures that predict semantics directly from speech. Because of their input and output types, we refer to them as speech-to-interpretation (STI) models. Previous works have successfully applied STI models to targeted use cases, such as recognizing home automation commands, however no study has yet addressed how these models generalize to broader use cases. In this work, we analyze the relationship between the performance of STI mo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
11
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 9 publications
(12 citation statements)
references
References 15 publications
1
11
0
Order By: Relevance
“…We use two E2E SLU datasets for our experiments -(1) the publicly available Fluent Speech Commands (FSC) and ( 2) an internal SLU dataset. Additionally, we create a "hard test set" to assess model performance in the most demanding scenarios in generalized VA. We use the average n-gram entropy and Minimum Spanning Tree (MST) complexity score as described in [27] to quantify their levels of semantic complexity. Fluent Speech Commands -FSC [21] is an SLU dataset containing 30,043 utterances with a vocabulary of 124 words and 248 unique utterances over 31 intents in home appliance and smart speaker control.…”
Section: Datamentioning
confidence: 99%
See 3 more Smart Citations
“…We use two E2E SLU datasets for our experiments -(1) the publicly available Fluent Speech Commands (FSC) and ( 2) an internal SLU dataset. Additionally, we create a "hard test set" to assess model performance in the most demanding scenarios in generalized VA. We use the average n-gram entropy and Minimum Spanning Tree (MST) complexity score as described in [27] to quantify their levels of semantic complexity. Fluent Speech Commands -FSC [21] is an SLU dataset containing 30,043 utterances with a vocabulary of 124 words and 248 unique utterances over 31 intents in home appliance and smart speaker control.…”
Section: Datamentioning
confidence: 99%
“…The SLU task on this dataset is just the intent classification task. It has an average n-gram entropy of 6.9 bits and an average MST complexity score of 0.2 [27].…”
Section: Datamentioning
confidence: 99%
See 2 more Smart Citations
“…A Conversational AI is composed by end-to-end spoken language understanding (SLU) models to predict semantics directly from speech [3]. The conventional approach to SLU uses two distinct components to sequentially process a spoken utterance: an automatic speech recognition (ASR) model that transcribes the speech to a text transcript, followed by a natural language understanding (NLU) model that predicts the domain, intent, and entities given the transcript.…”
Section: Introductionmentioning
confidence: 99%