2021
DOI: 10.1250/ast.42.252
|View full text |Cite
|
Sign up to set email alerts
|

Deep learning based large vocabulary continuous speech recognition of an under-resourced language Bangladeshi Bangla

Abstract: Research in corpus-driven Automatic Speech Recognition (ASR) is advancing rapidly towards building a robust Large Vocabulary Continuous Speech Recognition (LVCSR) system. Under-resourced languages like Bangla require benchmarking large corpora for more research on LVCSR to tackle their limitations and avoid the biased results. In this paper, a publicly published large-scale Bangladeshi Bangla speech corpus is used to implement deep Convolutional Neural Network (CNN) based model and Recurrent Neural Network (RN… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 15 publications
0
4
0
Order By: Relevance
“…Samin et al evaluated the quality of a large-scale publicly available LB-ASRTD corpus (229 hours) using deep learning-based approaches by conducting character-wise error analysis [20]. They also found a deep CNN-based acoustic model and a 5-gram Markov Language Model (LM) to be capable of achieving a lower word error rate (WER) on LB-ASRTD.…”
Section: Related Work In Banglamentioning
confidence: 99%
See 3 more Smart Citations
“…Samin et al evaluated the quality of a large-scale publicly available LB-ASRTD corpus (229 hours) using deep learning-based approaches by conducting character-wise error analysis [20]. They also found a deep CNN-based acoustic model and a 5-gram Markov Language Model (LM) to be capable of achieving a lower word error rate (WER) on LB-ASRTD.…”
Section: Related Work In Banglamentioning
confidence: 99%
“…In this study, we also use a deep CNN-based model while utilizing a higher number of MFCCs during the input feature extraction and introducing layer normalization in each convolution layer. Based on an acoustic study on a regional accented speech and the character-wise error analysis on LB-ASRTD, the requirement of a new corpus with more speaker variability and character-wise well-balancedness was recommended [20], [21]. Therefore, Kibria et al developed the 241-hour-long publicly available Bangladeshi Bangla SUBAK.KO corpus with the aim of addressing the abovementioned issues of LB-ASRTD [7].…”
Section: Related Work In Banglamentioning
confidence: 99%
See 2 more Smart Citations