2019
DOI: 10.5120/ijca2019918401
|View full text |Cite
|
Sign up to set email alerts
|

Nepali Speech Recognition using RNN-CTC Model

Abstract: This paper presents a Neural Network based Nepali Speech Recognition model. RNN (Recurrent Neural Networks) is used for processing sequential audio data. CTC (Connectionist Temporal Classification) [1] technique is applied allowing RNN to train over audio data. CTC is a probabilistic approach of maximizing the occurrence probability of the desired labels from RNN output. After processing through RNN and CTC layers, Nepali text is obtained as output. This paper also defines a character set of 67 Nepali characte… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 4 publications
0
6
0
Order By: Relevance
“…Over time, independent research publications [10][3] [11] have claimed that they prepared the data source and conducted experiments based on those datasets. However, it should be noted that none of these sources have been published or made available for open access in the public domain.…”
Section: Nepali Speech Corpusmentioning
confidence: 99%
See 1 more Smart Citation
“…Over time, independent research publications [10][3] [11] have claimed that they prepared the data source and conducted experiments based on those datasets. However, it should be noted that none of these sources have been published or made available for open access in the public domain.…”
Section: Nepali Speech Corpusmentioning
confidence: 99%
“…The Hidden Markov Model (HMM) was employed to train the model, resulting in an accuracy of 74.99% for single-word inputs and 55.55% for three-word phrase inputs. Regmi et al [10] examined RNN-CTC based model on self-prepared Nepali speech dataset. The CTC loss function and beam search were used for the purpose of training and decoding technique respectively.…”
Section: A Acoustic Modelsmentioning
confidence: 99%
“…This project designs to use a speech recognition system to replace the traditional mouse and keyboard operations to improve the intelligence level of the main control room. To make it applicable in nuclear power plant control, it is necessary to ensure that the recognition is highly reliable [1].…”
Section: Introductionmentioning
confidence: 99%
“…End-to-end ASR systems have already overcome the traditional HMM and DNN systems due to its simplicity and convenience, where there is no need to have the usage of language model, pronunciation model etc. This [8] model have been built with the help of the technique, which is called CTC (Connectionist Temporal Classifier). CTC makes the automatic segmentation of audio signal and maps the audio wave directly to transcriptions.…”
Section: Introductionmentioning
confidence: 99%
“…The presented method improves the performance of this LSTM over the traditional LSTM. The main disadvantage of these models [8][9], they require a large amount of data, which is a big problem for Kazakh language, because it hasn't been investigated and researched well. The datasets that exist today in Kazakh language are mostly private and not available for free.…”
Section: Introductionmentioning
confidence: 99%