2020
DOI: 10.1007/978-3-030-60276-5_27
|View full text |Cite
|
Sign up to set email alerts
|

CTC-Segmentation of Large Corpora for German End-to-End Speech Recognition

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
32
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 50 publications
(32 citation statements)
references
References 6 publications
0
32
0
Order By: Relevance
“…In addition, the CTC-based algorithm has an alignment function and can output alignment information. Recently, Ludwig Kürzinge et al proposed to use a CTC-based network for the segmentation task and it outperforms the other existing segmentation tools [41]. It should be noted that here "segmentation task" in [41] is similar to the SAD task.…”
Section: Selection Of the Force Alignment Module For Asr And Kwsmentioning
confidence: 99%
See 1 more Smart Citation
“…In addition, the CTC-based algorithm has an alignment function and can output alignment information. Recently, Ludwig Kürzinge et al proposed to use a CTC-based network for the segmentation task and it outperforms the other existing segmentation tools [41]. It should be noted that here "segmentation task" in [41] is similar to the SAD task.…”
Section: Selection Of the Force Alignment Module For Asr And Kwsmentioning
confidence: 99%
“…Therefore, it is possible to compute all possible maximum joint probabilities for aligning the text via dynamic programming. In [41], the CTC network is used for utterance-level segments and outperforms other existing segmentation tools. However, because CTC is also a kind of sequence modeling, the CTC loss is actually the sum of the probabilities of multiple alignment paths.…”
Section: Selection Of the Force Alignment Module For Asr And Kwsmentioning
confidence: 99%
“…Typical framesynchronous alignment methods require frame-wise prediction by pre-trained ASR models. Recently, a DNN-based method referred to as CTC-Segmentation [21] has been proposed. CTC-Segmentation generates frame-wise token posteriors using CTC, one of the end-to-end neural network models, and then the alignment is estimated by finding an optimal path from the CTC trellis based on the generated posteriors.…”
Section: Alignment Approaches 21 Frame-synchronous Alignmentmentioning
confidence: 99%
“…A traditional approach aligns the text by finding an optimal path from the HMM trellis using Viterbi algorithm [19,20]. A similar work based on connectionist temporal classification (CTC) model has also been proposed recently [21]. In, [22,23], the long audio recordings are firstly recognized by a pre-trained ASR model, then the alignment is performed based on text matching between the recognized text and manual transcripts.…”
Section: Introductionmentioning
confidence: 99%
“…One reason is that model ASR has increasingly shifted towards end-to-end training using loss functions like CTC [9] that disregards precise frame alignment. Only a few works explored using neural networks to perform segmentation of sentences [10] and phones [11,12,13]. These works demonstrate great potentials for neural forced alignment, but they still required text transcriptions.…”
Section: Introductionmentioning
confidence: 99%