A Comparitive Survey of ANN and Hybrid HMM/ANN Architectures for Robust Speech Recognition

Frikha, Mounir; Hamida, Ahmed Ben

doi:10.5923/j.ajis.20120201.01

Cited by 12 publications

(4 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A new approach towards high performance speech/music discrimination on realistic tasks related to the automatic transcription of broadcast news is described in (Frikha and Hamida, 2012), in which an Artificial Neural Network (ANN) and HIDDEN Markov Model (HMM) are used. Subashini et al (2012), a generic audio classification and segmentation approach for multimedia indexing and retrieval is described.…”

Section: Related Workmentioning

confidence: 99%

Speech/Music Classification Using Wavelet Based Feature Extraction Techniques

Ramalingam¹,

Dhanalakshmi²

2014

Journal of Computer Science

View full text Add to dashboard Cite

Audio classification serves as the fundamental step towards the rapid growth in audio data volume. Due to the increasing size of the multimedia sources speech and music classification is one of the most important issues for multimedia information retrieval. In this work a speech/music discrimination system is developed which utilizes the Discrete Wavelet Transform (DWT) as the acoustic feature. Multi resolution analysis is the most significant statistical way to extract the features from the input signal and in this study, a method is deployed to model the extracted wavelet feature. Support Vector Machines (SVM) are based on the principle of structural risk minimization. SVM is applied to classify audio into their classes namely speech and music, by learning from training data. Then the proposed method extends the application of Gaussian Mixture Models (GMM) to estimate the probability density function using maximum likelihood decision methods. The system shows significant results with an accuracy of 94.5%.

show abstract

Section: Related Workmentioning

confidence: 99%

Speech/Music Classification Using Wavelet Based Feature Extraction Techniques

Ramalingam¹,

Dhanalakshmi²

2014

Journal of Computer Science

View full text Add to dashboard Cite

show abstract

“…Researchers around the world are still trying to build various methods and algorithms that are robust and have high accuracy in speech recognition. Some research on speech recognition includes Speech recognition with artificial neural networks with the method of voice recognition with Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques [1], Voice recognition using Hidden Markov Mode [2] which results in accuracy up to 86.67%, Research speech recognition by combining the Artificial Neural Network method with Hidden Markov Model [3], Hindi voice recognition with Hidden Markov Model [4], Voice recognition for the biometric field with the Vector Quantization method [5]. Research on voice recognition in the groundwater was also carried out using Mel-Frequency Cepstrum Coefficients (MFCC) and Adaptive Neuro-Fuzzy Inferense System (ANFIS) resulting in an accuracy rate of 95.90% [6].…”

Section: Introductionmentioning

confidence: 99%

Javanese Gender Speech Recognition Based on Machine Learning Using Random Forest and Neural Network

Nugroho¹

2020

Journal of Information Systems

View full text Add to dashboard Cite

Speech is a means of communication between people throughout the world. At present research in the field of speech recognition continues to develop in producing a robust method in various research variants. However decreasing the word error rate or reducing noise is still a problem that is still being investigated until now. The purpose of this study is to find the right method with high accuracy to classify the gender voices of Javanese. This research used a human voice dataset of both men and women from the Javanese tribe which was recorded and then processed using a noise reduction preprocessing technique with the MFCC extraction feature method and then classified using 2 machine learning methods, namely Random Forest and Neural Network. Evaluation results indicate that the classification of Javanese accent speech accents results in an accuracy rate of 91.3 % using Random Forest and 92.2% using Neural Network.

show abstract

“…This is due to the efficiency of the HMMs to model the variation in the statistical properties of speech, both in the time and the frequency domains [40]. …”

Section: Introductionmentioning

confidence: 99%

“…HMMs were considered for this work because these are the most frequent techniques used for recognition of normal and disordered speech. This is due to the efficiency of the HMMs to model the variation in the statistical properties of speech, both in the time and the frequency domains [ 40 ].…”

Section: Introductionmentioning

confidence: 99%

Estimation of Phoneme-Specific HMM Topologies for the Automatic Recognition of Dysarthric Speech

Caballero‐Morales

2013

Computational and Mathematical Methods in Medicine

View full text Add to dashboard Cite

Dysarthria is a frequently occurring motor speech disorder which can be caused by neurological trauma, cerebral palsy, or degenerative neurological diseases. Because dysarthria affects phonation, articulation, and prosody, spoken communication of dysarthric speakers gets seriously restricted, affecting their quality of life and confidence. Assistive technology has led to the development of speech applications to improve the spoken communication of dysarthric speakers. In this field, this paper presents an approach to improve the accuracy of HMM-based speech recognition systems. Because phonatory dysfunction is a main characteristic of dysarthric speech, the phonemes of a dysarthric speaker are affected at different levels. Thus, the approach consists in finding the most suitable type of HMM topology (Bakis, Ergodic) for each phoneme in the speaker's phonetic repertoire. The topology is further refined with a suitable number of states and Gaussian mixture components for acoustic modelling. This represents a difference when compared with studies where a single topology is assumed for all phonemes. Finding the suitable parameters (topology and mixtures components) is performed with a Genetic Algorithm (GA). Experiments with a well-known dysarthric speech database showed statistically significant improvements of the proposed approach when compared with the single topology approach, even for speakers with severe dysarthria.

show abstract

A Comparitive Survey of ANN and Hybrid HMM/ANN Architectures for Robust Speech Recognition

Cited by 12 publications

References 22 publications

Speech/Music Classification Using Wavelet Based Feature Extraction Techniques

Speech/Music Classification Using Wavelet Based Feature Extraction Techniques

Javanese Gender Speech Recognition Based on Machine Learning Using Random Forest and Neural Network

Estimation of Phoneme-Specific HMM Topologies for the Automatic Recognition of Dysarthric Speech

Contact Info

Product

Resources

About