ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9746821
|View full text |Cite
|
Sign up to set email alerts
|

Investigating Sequence-Level Normalisation For CTC-Like End-to-End ASR

Abstract: End-to-end Automatic Speech Recognition (E2E ASR) significantly simplifies the training process of an ASR model. Connectionist Temporal Classification (CTC) is one of the most popular methods for E2E ASR training. Implicitly, CTC has a unique topology which is very useful for sequence modelling. However, we find that by changing to another topology, we can make it even more effective. In this paper, we propose a new CTC-like method, for E2E ASR training, by modifying the topology of original CTC, so that the w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 19 publications
0
4
0
Order By: Relevance
“…(3) Combined with the CTC decoding algorithm, the end-to-end prediction of sequence data is effectively realized [10].…”
Section: Core Idea Of Ctcmentioning
confidence: 99%
See 1 more Smart Citation
“…(3) Combined with the CTC decoding algorithm, the end-to-end prediction of sequence data is effectively realized [10].…”
Section: Core Idea Of Ctcmentioning
confidence: 99%
“…The recursive calculation formula of the backward probability obtained by recursion is shown in Eq. (10).…”
Section: Core Idea Of Ctcmentioning
confidence: 99%
“…Note that with most topologies (except CTC topology), not all the paths in E are valid, which makes the summation of the probabilities of all possible word sequences not equal to one. A previous work [22] has shown that the normalisation (the denominator term in eq. ( 2)) is crucial for the sequence-level loss function.…”
Section: Trainingmentioning
confidence: 99%
“…Note that in [26], the authors compared different CTC topologies, while in this work, we compare different topologies most of which are not equivalent to CTC. Inspired by previous work [22], we introduce an extra state for each phone to increase the modelling power, resulting in the S2-T1 topology in table 1, where there is no self-loop for the first state in S2-T1, and the second state is optional and skippable. S2-T1⋆ is similar to S2-T1, except for the self-loop on the first state.…”
Section: Topologiesmentioning
confidence: 99%