2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2017
DOI: 10.1109/asru.2017.8268987
|View full text |Cite
|
Sign up to set email alerts
|

Exploring ASR-free end-to-end modeling to improve spoken language understanding in a cloud-based dialog system

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

1
55
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 65 publications
(56 citation statements)
references
References 28 publications
1
55
0
Order By: Relevance
“…Work in progress. end-to-end architectures capable of learning how to map sequences of acoustic features directly to SLU recognition units [5,6,7,8]. SLU units that are typically used are combinations of ASR-level units (e.g.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Work in progress. end-to-end architectures capable of learning how to map sequences of acoustic features directly to SLU recognition units [5,6,7,8]. SLU units that are typically used are combinations of ASR-level units (e.g.…”
Section: Introductionmentioning
confidence: 99%
“…End-to-end SLU architecture Train: (Utterances, Speakers) (115660, 77) Validation: (Utterances, Speakers) (3118, 10) Test: (Utterances, Speakers) (3793, 10) Unique Intents 31 Unique: (Actions, Objects, Locations)(6,14,4) …”
mentioning
confidence: 99%
“…Nowadays there is a growing research interest in end-to-end systems for various SLU tasks [23][24][25][26][27][28][29][30][31]. In this work, similarly to [26,29], end-to-end training of signal-to-concept models is performed through the recurrent neural network (RNN) architecture and the connectionist temporal classification (CTC) loss function [32] as shown in Figure 1.…”
Section: End-to-end Signal-to-concept Neural Architecturementioning
confidence: 99%
“…The use of end-to-end models for spoken language understanding (SLU) is beginning to be given more serious consideration [1][2][3][4]. Whereas conventional SLU uses an automatic speech recognition (ASR) component to transcribe the audio into text and a natural language understanding (NLU) component to map the text to semantics, an end-to-end model maps the audio directly to the semantics [5][6][7]. End-to-end models have several advantages over the conventional SLU setup: they have reduced computational requirements and software implementation complexity, avoid downstream errors due to incorrect transcripts, can have the entire set of model parameters optimized for the ultimate performance criterion (semantic accuracy) as opposed to a surrogate criterion (word error rate), and can take advantage of information present in the speech signal but not in the transcript, such as prosody.…”
Section: Introductionmentioning
confidence: 99%