Nepali Speech Recognition using RNN-CTC Model

Paribesh, Regmi,; Dahal, Arjun; Joshi, Basanta

doi:10.5120/ijca2019918401

Cited by 7 publications

(6 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Over time, independent research publications [10][3] [11] have claimed that they prepared the data source and conducted experiments based on those datasets. However, it should be noted that none of these sources have been published or made available for open access in the public domain.…”

Section: Nepali Speech Corpusmentioning

confidence: 99%

“…The Hidden Markov Model (HMM) was employed to train the model, resulting in an accuracy of 74.99% for single-word inputs and 55.55% for three-word phrase inputs. Regmi et al [10] examined RNN-CTC based model on self-prepared Nepali speech dataset. The CTC loss function and beam search were used for the purpose of training and decoding technique respectively.…”

Section: A Acoustic Modelsmentioning

confidence: 99%

See 1 more Smart Citation

Enhancing the Quality of Nepali Text-to-Speech Systems

Ghimire

Bal

2017

Communications in Computer and Information Science

View full text Add to dashboard Cite

In this paper, we examine the research conducted in the field of Nepali Automatic Speech Recognition (ASR). The primary objective of this survey is to conduct a comprehensive review of the works on Nepali Automatic Speech Recognition Systems completed to date, explore the different datasets used, examine the technology utilized, and take account of the obstacles encountered in implementing the Nepali ASR system.In tandem with the global trends of ever-increasing research on speech recognition based research, the number of Nepalese ASR-related projects are also growing. Nevertheless, the investigation of language and acoustic models of the Nepali language has not received adequate attention compared to languages that possess ample resources. In this context, we provide a framework as well as directions for future investigations.

show abstract

Section: Nepali Speech Corpusmentioning

confidence: 99%

Section: A Acoustic Modelsmentioning

confidence: 99%

Enhancing the Quality of Nepali Text-to-Speech Systems

Ghimire

Bal

2017

Communications in Computer and Information Science

View full text Add to dashboard Cite

show abstract

“…This project designs to use a speech recognition system to replace the traditional mouse and keyboard operations to improve the intelligence level of the main control room. To make it applicable in nuclear power plant control, it is necessary to ensure that the recognition is highly reliable [1].…”

Section: Introductionmentioning

confidence: 99%

Towards Speech Recognition and Training Utilization in the Nuclear Power Main Control Room

Wang

Deng

Duan

et al. 2022

J. Phys.: Conf. Ser.

View full text Add to dashboard Cite

Speech, as the material shell of language, is the external form of language, and is the symbol system that most directly records people’s thinking activities. As a means of interaction, it has the characteristics of simplicity and speed. For a long time, due to insufficient recognition accuracy, it cannot be applied in industry. However, with the rapid development of big data and machine learning technology, the training effect of speech model has greatly increased, and the reliability of speech recognition has been greatly improved. The application of speech interaction in nuclear power plants with extremely high safety requirements brings feasibility. The purpose of this system is to establish an intelligent speech control system and its related speech model training system, and use Kaldi and Python scripts as model training scripts to train the corpus. And designed experiments to verify that after using speech recognition, it can effectively reduce the operation time by about 45%, and the overall task execution efficiency has been significantly improved. Therefore, the use of speech recognition can significantly reduce the operator’s task load and reduce the probability of human error.

show abstract

“…End-to-end ASR systems have already overcome the traditional HMM and DNN systems due to its simplicity and convenience, where there is no need to have the usage of language model, pronunciation model etc. This [8] model have been built with the help of the technique, which is called CTC (Connectionist Temporal Classifier). CTC makes the automatic segmentation of audio signal and maps the audio wave directly to transcriptions.…”

Section: Introductionmentioning

confidence: 99%

“…The presented method improves the performance of this LSTM over the traditional LSTM. The main disadvantage of these models [8][9], they require a large amount of data, which is a big problem for Kazakh language, because it hasn't been investigated and researched well. The datasets that exist today in Kazakh language are mostly private and not available for free.…”

Section: Introductionmentioning

confidence: 99%

Development of Automatic Speech Recognition for Kazakh Language using Transfer Learning

Amirgaliyev¹

2020

IJATCSE

View full text Add to dashboard Cite

Development of an automatic speech recognition system for the Kazakh language is a challenging task due to the lack of audio data and specificity and complexity of the language itself. In this paper, we propose a new method which gets a pre-trained model of the russian language and uses the weight values of the pre-trained model in the proposed neural network. The main reason for choosing the Russian language model is that the pronunciation of the Kazakh and Russian languages is very similar in many respects, because they account for 78% of the total letters and there is a rather large corpus of the Russian speech dataset. The dataset of Kazakh speech with transcriptions was formed by the university's faculty. In general, 50 native speakers were involved who generated about 400 sentences. A special technology has been created for the automatic expansion of the database. The data was extracted from well-known Kazakh books such as "Abai zholy", "Kara sozder", etc.

show abstract

Nepali Speech Recognition using RNN-CTC Model

Cited by 7 publications

References 4 publications

Enhancing the Quality of Nepali Text-to-Speech Systems

Enhancing the Quality of Nepali Text-to-Speech Systems

Towards Speech Recognition and Training Utilization in the Nuclear Power Main Control Room

Development of Automatic Speech Recognition for Kazakh Language using Transfer Learning

Contact Info

Product

Resources

About