Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-2860
|View full text |Cite
|
Sign up to set email alerts
|

Vectorized Beam Search for CTC-Attention-Based Speech Recognition

Abstract: Attention-based encoder decoder network uses a left-to-right beam search algorithm in the inference step. The current beam search expands hypotheses and traverses the expanded hypotheses at the next time step. This traversal is implemented using a for-loop program in general, and it leads to speed down of the recognition process. In this paper, we propose a parallelism technique for beam search, which accelerates the search process by vectorizing multiple hypotheses to eliminate the for-loop program. We also p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
40
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
9

Relationship

3
6

Authors

Journals

citations
Cited by 33 publications
(46 citation statements)
references
References 24 publications
(22 reference statements)
0
40
1
Order By: Relevance
“…Large-scale training/decoding We support job schedulers (e.g., SLURM, Grid Engine), multiple GPUs and half/mixed-precision training/decoding with apex (Micikevicius et al, 2018). 5 Our beam search implementation vectorizes hypotheses for faster decoding (Seki et al, 2019).…”
Section: Additional Featuresmentioning
confidence: 99%
“…Large-scale training/decoding We support job schedulers (e.g., SLURM, Grid Engine), multiple GPUs and half/mixed-precision training/decoding with apex (Micikevicius et al, 2018). 5 Our beam search implementation vectorizes hypotheses for faster decoding (Seki et al, 2019).…”
Section: Additional Featuresmentioning
confidence: 99%
“…We use joint training with hybrid CTC/attention ASR by setting mtl-alpha to 0.3 and asr-weight to 0.5 as defined by Watanabe et al (2018). During inference, we perform beam search (Seki et al, 2019) on the ST sequences, using a beam size of 10, length penalty of 0.2, max length ratio of 0.3 (Watanabe et al, 2018).…”
Section: Multi-decoder Modelmentioning
confidence: 99%
“…8best check-points are averaged and the averaged weights are used for decoding the hypothesis. Vectorized beam search (Seki et al, 2019) was used for decoding the ASR hypotheses with a beam size of 10. Further in this paper, ASR models described in this section are referred to as Ext.ASR models (Externally trained ASR models).…”
Section: Automatic Speech Recognition (Asr)mentioning
confidence: 99%
“…The noisy EOS tokens are pruned out using (Kahn et al, 2019). Vectorized beam (Seki et al, 2019) search has been used for decoding the hypotheses with a beam size of 8. A large variance in the performance is observed w.r.t the decoding hyper-parameters such as maximum target sequence length and length-bonus.…”
Section: Machine Translation Systems(mt)mentioning
confidence: 99%