ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9054098
|View full text |Cite
|
Sign up to set email alerts
|

Minimum Latency Training Strategies for Streaming Sequence-to-Sequence ASR

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
70
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 42 publications
(70 citation statements)
references
References 24 publications
0
70
0
Order By: Relevance
“…In this section, we present latency metrics for streaming ASR and techniques to reduce them. Different from the average token latency [21], we adopt metrics that are directly related to streaming speech applications such as Voice Search and Assistant.…”
Section: Latency Improvementsmentioning
confidence: 99%
See 2 more Smart Citations
“…In this section, we present latency metrics for streaming ASR and techniques to reduce them. Different from the average token latency [21], we adopt metrics that are directly related to streaming speech applications such as Voice Search and Assistant.…”
Section: Latency Improvementsmentioning
confidence: 99%
“…It has been shown to give significant lower latency while retaining recognition accuracy on different RNN-T models. More importantly, FastEmit does not require any prior alignment information [20,21] and has no additional training or serving cost.…”
Section: Fastemitmentioning
confidence: 99%
See 1 more Smart Citation
“…Using wordpieces as labels require input embedding and no pointers are provided for leveraging the already existing vast linguistic resources in non-wordpiece form. In [20], an attention-based sequence-to-sequence model with pre-training on frame-wise classification tasks is presented for achieving streaming capability. In [24], a latency controlled bidirectional LSTM with 1.2 seconds sized chunks is used.…”
Section: Background and Related Workmentioning
confidence: 99%
“…In order to achieve simultaneous speech-tospeech translation (SSST), to the best of our knowledge, most recent approaches (Oda et al, 2014; dismantle the entire system into a three-step pipelines, streaming Automatic Speech Recognition (ASR) (Sainath et al, 2020;Inaguma et al, 2020;Li et al, 2020), simultaneous Text-to-Text translation (sT2T) (Gu et al, 2017;Ma et al, 2019;Arivazhagan et al, 2019;, and Text-to-Speech (TTS) synthesis (Wang et al, 2017;Ping et al, 2017;Oord et al, 2017). Most recent efforts mainly focus on sT2T which is considered the key component to further reduce the translation latency and improve the translation quality for the entire pipeline.…”
Section: Introductionmentioning
confidence: 99%