Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-1683
|View full text |Cite
|
Sign up to set email alerts
|

Comparison of Decoding Strategies for CTC Acoustic Models

Abstract: Connectionist Temporal Classification has recently attracted a lot of interest as it offers an elegant approach to building acoustic models (AMs) for speech recognition. The CTC loss function maps an input sequence of observable feature vectors to an output sequence of symbols. Output symbols are conditionally independent of each other under CTC loss, so a language model (LM) can be incorporated conveniently during decoding, retaining the traditional separation of acoustic and linguistic components in ASR.For … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

1
33
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 36 publications
(34 citation statements)
references
References 28 publications
(50 reference statements)
1
33
0
Order By: Relevance
“…Except for the modeling unit, these models are very similar to conventional acoustic models and perform well when combined with an external LM during decoding (beam search). [23,24].…”
Section: Introductionmentioning
confidence: 99%
“…Except for the modeling unit, these models are very similar to conventional acoustic models and perform well when combined with an external LM during decoding (beam search). [23,24].…”
Section: Introductionmentioning
confidence: 99%
“…Some works use this decoding method to build the CTC-layers in their hardware architectures of RNNs [17]. Although this way can already provide useful transcriptions, its limited accuracy is not sufficient to meet the demands of many sequence tasks [26].…”
Section: B Ctc Beam Search Decodingmentioning
confidence: 99%
“…In ASR tasks, the traditional approach is based on HMMs [16], while recent works have shown great interest in building end-to-end models, using CTC-based deep RNNs. By training networks with large amounts of data, CTC-based models achieved great success [7], [11], [4], [12], [26], [18]. CTC is also widely used in other learning tasks such as handwriting recognition and scene text recognition, offering superior performance [8], [2], [19].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The first lexicon-free beam-search decoder aiming at dealing with OOV was benchmarked on Switchboard [21], although with a significantly worse word error rate (WER) than lexicon-based systems. Other recent works in this direction include [22,23] on the English and [24, 25] on the Arabic and Finnish languages.Here, we study a simple end-to-end ASR system combining a character level acoustic model with a character level language model through beam search. We show that it can yield competitive word error rates on the WSJ and Librispeech corporas, even without a lexicon.…”
mentioning
confidence: 99%