ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8683607
|View full text |Cite
|
Sign up to set email alerts
|

When CTC Training Meets Acoustic Landmarks

Abstract: Connectionist temporal classification (CTC) provides an endto-end acoustic model (AM) training strategy. CTC learns accurate AMs without time-aligned phonetic transcription, but sometimes fails to converge, especially in resourceconstrained scenarios. In this paper, the convergence properties of CTC are improved by incorporating acoustic landmarks. We tailored a new set of acoustic landmarks to help CTC training converge more rapidly and smoothly while also reducing recognition error rates. We leveraged new ta… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
2
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
2
2
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 22 publications
0
2
0
Order By: Relevance
“…In a CTC system with character outputs, however, it is difficult to share data for multilingual [16] or cross-lingual [17] ASR. Proposed solu-tions have included separate softmax tiers for the character set of each language [18,19,20], or the generation of phone strings instead of characters as the output of the CTC [21,22,23], or the use of both methods, in a multi-task learning framework, with one output tier generating phones, while another generates characters [24].…”
Section: Introductionmentioning
confidence: 99%
“…In a CTC system with character outputs, however, it is difficult to share data for multilingual [16] or cross-lingual [17] ASR. Proposed solu-tions have included separate softmax tiers for the character set of each language [18,19,20], or the generation of phone strings instead of characters as the output of the CTC [21,22,23], or the use of both methods, in a multi-task learning framework, with one output tier generating phones, while another generates characters [24].…”
Section: Introductionmentioning
confidence: 99%
“…The MTL approach is applied to neural networks by sharing some of the hidden layers between different tasks. Some research could improve the accuracy of CTC-based ASR by incorporating acoustic landmarks, which could help CTC training converge more rapidly and smoothly [66,67]. Moreover, the information of acoustic landmarks could be obtained, which could be used as an additional information source, to further improve the performance of the APED system [68].…”
mentioning
confidence: 99%