2021 24th Conference of the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databa 2021
DOI: 10.1109/o-cocosda202152914.2021.9660563
|View full text |Cite
|
Sign up to set email alerts
|

Investigation of a Single-Channel Frequency-Domain Speech Enhancement Network to Improve End-to-End Bengali Automatic Speech Recognition Under Unseen Noisy Conditions

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 25 publications
0
4
0
Order By: Relevance
“…This section evaluated the performance of the conformer-transducer ASR model applied using the proposed two-step joint optimization approach and compared it with the performance using multi-condition training [ 27 ] and the conventional joint optimization approaches [ 37 , 38 ]. In addition, an ablation study was performed to examine the effectiveness of the proposed joint optimization approach according to each processing block of the conformer-transducer ASR model.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…This section evaluated the performance of the conformer-transducer ASR model applied using the proposed two-step joint optimization approach and compared it with the performance using multi-condition training [ 27 ] and the conventional joint optimization approaches [ 37 , 38 ]. In addition, an ablation study was performed to examine the effectiveness of the proposed joint optimization approach according to each processing block of the conformer-transducer ASR model.…”
Section: Methodsmentioning
confidence: 99%
“…The performance of each ASR model obtained by various optimization approaches was evaluated by measuring the character error rate (CER) and word error rate (WER). The ASR models compared here were (1) ASR-only trained using the clean training dataset; (2) ASR-only trained using the noisy training dataset; (3) a combination of the speech enhancement (SE) and ASR models (denoted as SE-ASR) after each of the two models was separately trained using the noisy training dataset; (4) a combined model of the SE and ASR models (denoted as SE+ASR) trained by a conventional joint optimization as in [ 37 ]; (5) SE+ASR trained using a conventional two-step joint optimization as in [ 38 ]; and (6) SE+ASR trained using the proposed two-step joint optimization. Note that all the combined models from (3) to (6) were trained using the noisy training dataset.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations