2022 IEEE Spoken Language Technology Workshop (SLT) 2023
DOI: 10.1109/slt54892.2023.10022703
|View full text |Cite
|
Sign up to set email alerts
|

Stop: A Dataset for Spoken Task Oriented Semantic Parsing

Abstract: End-to-end spoken language understanding (SLU) predicts intent directly from audio using a single model. It promises to improve the performance of assistant systems by leveraging acoustic information lost in the intermediate textual representation and preventing cascading errors from Automatic Speech Recognition (ASR). Further, having one unified model has efficiency advantages when deploying assistant systems on-device. However, the limited number of public audio datasets with semantic parse labels hinders th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 15 publications
0
3
0
Order By: Relevance
“…We present two novel techniques to improve E2E SLU models: 1) a method to encode ASR hypothesis quality and 2) an effective method to integrate these quality information into E2E SLU models. We show accuracy improvements on STOP dataset [16] in the on-device streaming scenario and share the analysis to demonstrate the effectiveness of our approach.…”
Section: Introductionmentioning
confidence: 73%
See 1 more Smart Citation
“…We present two novel techniques to improve E2E SLU models: 1) a method to encode ASR hypothesis quality and 2) an effective method to integrate these quality information into E2E SLU models. We show accuracy improvements on STOP dataset [16] in the on-device streaming scenario and share the analysis to demonstrate the effectiveness of our approach.…”
Section: Introductionmentioning
confidence: 73%
“…We used the largest public SLU dataset, STOP (Spoken Task Oriented Semantic Parsing) [16] to evaluate our proposed approach. The STOP dataset is based on Task-Oriented Semantic Parsing (TOPv2) [25], a well-known NLU benchmark, that covers 8 different domains including alarm, messaging, music, navigation, timer, weather, reminder, and event.…”
Section: Stop Datasetmentioning
confidence: 99%
“…Additionally, audio datasets collected by the authors themselves in real environments or situations were observed. Examples include the Chime-Home [ 7 ], a dataset of gunshot audio [ 8 ], one focused on motor sounds [ 9 ], and some specific resources for spoken tasks, such as AudioMNIST [ 10 ] and STOP [ 11 ]. Also, there are audio datasets created through cutting, modifications, and transformations applied to existing datasets, such as SARdB [ 12 ] for audio scenes and Shrutilipi [ 13 ] for automatic speech recognition.…”
Section: Introductionmentioning
confidence: 99%