Voice Recognition using Dynamic Time Warping and Mel-Frequency Cepstral Coefficients Algorithms

Mansour, Abdelmajid Hassan; Salh, Gafar Zen Alabdeen; Mohammed, Khalid A.

doi:10.5120/20312-2362

Cited by 26 publications

(15 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The Mel-Frequency Cepstral (MFC) is a well-known method for extracting the speech signal features. A smart combination of the MFC and the Dynamic Time Warping (DTW) techniques can provide effective solutions especially in the case of isolated speech words recognition [23,26,32]. The devised system deals with the isolated speech word.…”

Section: Methodsmentioning

confidence: 99%

Speech Recognition and a Cymatics Based Configurable Speech Perception

Qaisar¹

2018

Preprint

View full text Add to dashboard Cite

This paper propose an original approach of achieving a Cymatics based visual perception of isolated speech commands. The idea is to smartly combine the effective speech processing and analysis methods with the phenomena of Cymatics. In this context, an effective approach for automatic isolated speech based message recognition is proposed. The incoming speech segment is enhanced by applying the appropriate pre-emphasis filtering, noise thresholding and zero alignment operations. The Mel-Frequency Cepstral coefficients (MFCCs), Delta coefficients and Delta-Delta coefficients are extracted from the enhanced speech segment. Later on, the Dynamic Time Warping (DTW) technique is employed to compare these extracted features with the reference templates. The comparison outcomes are used to make the classification decision. The classification decision is transformed into a methodical excitation. Finally, this excitation is converted into the systematic visual perceptions via the phenomenon of Cymatics. The system functionality is tested with an experimental setup and results are presented. The approach is novel and can be employed in various applications like visual art, encryption, education, archeology, architecture, integration of impaired people, etc.

show abstract

Section: Methodsmentioning

confidence: 99%

Speech Recognition and a Cymatics Based Configurable Speech Perception

Qaisar¹

2018

Preprint

View full text Add to dashboard Cite

show abstract

“…Another method is Perceptual Linear prediction (PLP), which is an analytical model perceptually motivated auditory spectrum by a low order pole function using the autocorrelation LP technique [8,11,12]. The PLP analysis provides similar results as with the LPC analysis, but the order of PLP model is half of the LP model therefore less computational storage [13,14]. PLP sometimes has been slightly better than LPCC, when it comes to noisy environment.…”

Section: Feature Extractionmentioning

confidence: 99%

“…PLP sometimes has been slightly better than LPCC, when it comes to noisy environment. Among those techniques, the most widely used feature extraction methods is Mel frequency Cepstral Coefficient (MFCC) in the field of ASR [8,14]. MFCC provides good discrimination [5] and low correlation between coefficients, but MFCC performance might be affected by the number of filters [10] and does not give accurate results if there are background noise [8].…”

Section: Feature Extractionmentioning

confidence: 99%

“…Artificial Neural Networks (ANN) are another classifier of speech recognition with acceptable accuracy. ANN is a nonlinear model which is easier to use and understand than statistical methods, but ANN may give unpredictable output quality [14,15]. From the above comparative studies of different methods for speech recognition, there are still issues that can be further improved, especially on the accuracy.…”

Section: Pattern Matchingmentioning

confidence: 99%

See 1 more Smart Citation

A Normalized Least Mean Square and Dynamic Time Warping (DTW) Algorithm for an Intelligent Quran Tutoring System

Mazumder¹,

Salam²

2018

IJET

View full text Add to dashboard Cite

Al-Quran is the most recited holy book in the Arabic language. Over 1.3-billion Muslim all over the world have an obligation to recite and learn Al-Quran. Learners from non-Arabic as well as from Arabic speaking communities face difficulties with Al-Quran recitation in the absence of a teacher (ustad) around. Advancement in speech recognition technology creates possible solutions to develop a system that has a capability to auricularly discern and validate the recitation. This paper investigates the speech recognition accuracy of template-based acoustic models and propose enhancement methods to improve the accuracy. A new scheme consists of enhancement of Normalized Least Mean Square (NLMS) and Dynamic Time Warping (DTW) algorithms have been proposed. The performance of the speech recognition accuracy was further improved by incorporating an adaptive optimal filtering with modified humming window for MFCC (Mel-frequency cepstral coefficients) using matching technique dynamic programming (DP), DTW (Dynamic Time Wrapping). The proposed scheme increases 5.5% of relative improvement in recognition accuracy achieved over conventional speech recognition process.

show abstract

“…Meanwhile another research used Linear Predictive Coding (LPC) [4], [14], and wavelet [15]. Another research compared MFCC more accurate than LPC with accuracy up to 100% [16], [17] and also more accurate than Dynamic Time Warping (DTW) with average 96% [18].…”

Section: Introductionmentioning

confidence: 99%

Spoken Word Recognition Using MFCC and Learning Vector Quantization

Djamal¹,

Nurhamidah²,

Ilyas³

2017

EECSI

View full text Add to dashboard Cite

Identification of spoken word(s) can be used to control external device. This research was result word identification in speech using Mel-Frequency Cepstrum Coefficients (MFCC) and Learning Vector Quantization (LVQ). The output of system operated the computer in certain genre song appropriate with the identified word. Identification was divided into three classes contain words such as "Klasik", "Dangdut" and "Pop", which are used to playing three types of accordingly songs. The voice signal is extracted by using MFCC and then identified using LVQ. The training and test set were obtained from six subjects and 10 times trial of the words "Klasik", "Dangdut" and "Pop" separately. Then the recorded sound signal is pre-processed using Histogram Equalization, DC Removal and Pre-emphasize to reduce noise from the sound signal, and then extracted using MFCC. The frequency spectrum generated from MFCC was identified using LVQ after passing through the training process first. Accuracy of the testing results is 92% for identification of training sets while testing new data recorded using different SNR obtained an accuracy of 46%. However, the test results of new data recorded using the same SNR with training data has an accuracy of 75.5%.

show abstract

Voice Recognition using Dynamic Time Warping and Mel-Frequency Cepstral Coefficients Algorithms

Cited by 26 publications

References 2 publications

Speech Recognition and a Cymatics Based Configurable Speech Perception

Speech Recognition and a Cymatics Based Configurable Speech Perception

A Normalized Least Mean Square and Dynamic Time Warping (DTW) Algorithm for an Intelligent Quran Tutoring System

Spoken Word Recognition Using MFCC and Learning Vector Quantization

Contact Info

Product

Resources

About