Speech Recognition Implementation Using MFCC and DTW Algorithm for Home Automation

Haq, Abdulloh Salahul; Nasrun, Muhammad; Setianingsih, Casi; Murti, Muhammad Ary

doi:10.11591/eecsi.v7.2041

Cited by 15 publications

(7 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Feature extraction in SER is very important as it helps to improve recognition accuracy and performance of speech signals [9]. The features we use in this research are MFCC, chromagram, Mel-spectrogram, spectral contrast, and tonnetz because they are the best features from previous research [4] and MFCC and Mel-spectrogram are widely used in SER [10], [11]. Spectral contrast can be defined as the decibel difference between peaks and valleys in a spectrum [12].…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Enhancing speech emotion recognition with deep learning using multi-feature stacking and data augmentation

Al Mukarram,

Mukhlas,

Zahra

2024

Bulletin EEI

View full text Add to dashboard Cite

This study evaluates the effectiveness of data augmentation on 1D convolutional neural network (CNN) and transformer models for speech emotion recognition (SER) on the Ryerson audio-visual database of emotional speech and song (RAVDESS) dataset. The results show that data augmentation has a positive impact on improving emotion classification accuracy. Techniques such as noising, pitching, stretching, shifting, and speeding are applied to increase data variation and overcome class imbalance. The 1D CNN model with data augmentation achieved 94.5% accuracy, while the transformer model with data augmentation performed even better at 97.5%. This research is expected to contribute better insights for the development of accurate emotion recognition methods by using data augmentation with these models to improve classification accuracy on the RAVDESS dataset. Further research can explore larger and more diverse datasets and alternative model approaches.

show abstract

Section: Methodsmentioning

confidence: 99%

“…1921 simultaneously, such as speech characteristics, language content, facial expressions, and body movements, SER is essentially a complicated multimodal task [4].…”

Section: Introductionmentioning

confidence: 99%

Enhancing speech emotion recognition with deep learning using multi-feature stacking and data augmentation

Al Mukarram,

Mukhlas,

Zahra

2024

Bulletin EEI

View full text Add to dashboard Cite

show abstract

“…The MFCC has become one of the effective features in gear fault detection, for instance, Benkedjouh et al extracted the MFCC feature and fed it to the SVM and claimed that the first three MFCC components contain the most defect information of gears [163]. However, based on the research of Abdul et al, 1-13 MFCC are more effective to be taken to train LSTM [164] and Jin et al evaluated some sets of MFCCs (16,21,26,31,36…”

Section: Gear Health Monitoringmentioning

confidence: 99%

Mel Frequency Cepstral Coefficient and its Applications: A Review

Abdul

Al-Talabani

2022

IEEE Access

View full text Add to dashboard Cite

Feature extraction and representation has significant impact on the performance of any machine learning method. Mel Frequency Cepstrum Coefficient (MFCC) is designed to model features of audio signal and is widely used in various fields. This paper aims to review the applications that the MFCC is used for in addition to some issues that facing the MFCC computation and its impact on the model performance. These issues include the use of MFCC for non-acoustic signals, adopting the MFCC alone or combining it with other features, the use of time series versus global representation of the MFCC, following the standard form of the MFCC computation versus modifying its parameters, and supplying the traditional machine learning methods versus the deep learning methods..

show abstract

“…The DTW is able to calculate the distance between two-time series and is thus a common method to measure similarity [42,43,47] . This method intends to find the optimal alignment of two temporal sequences with different lengths and speeds [48] , which results in better performance and more meaningful discrepancy distances than other approaches [42,49] . The DTW result represents the distance value in the scalar quantity [50] , which is employed to measure how similar two diffusion trends are in time sequences.…”

Section: Comparing the Similarity Of Trend Comparisonmentioning

confidence: 99%

An Operator-Based Approach for Modeling Influence Diffusion in Complex Social Networks

Jiang

D'Arienzo

et al. 2021

J. Soc. Comput.

View full text Add to dashboard Cite

Social media have dramatically changed the mode of information dissemination. Various models and algorithms have been developed to model information diffusion and address the influence maximization problem in complex social networks. However, it appears difficult for state-of-the-art models to interpret complex and reversible real interactive networks. In this paper, we propose a novel influence diffusion model, i.e., the Operator-Based Model (OBM), by leveraging the advantages offered from the heat diffusion based model and the agent-based model. The OBM improves the performance of simulated dissemination by considering the complex user context in the operator of the heat diffusion based model. The experiment obtains a high similarity of the OBM simulated trend to the real-world diffusion process by use of the dynamic time warping method. Furthermore, a novel influence maximization algorithm, i.e., the Global Topical Support Greedy algorithm (GTS-Greedy algorithm), is proposed corresponding to the OBM. The experimental results demonstrate its promising performance by comparing it against other classic algorithms.

show abstract

Speech Recognition Implementation Using MFCC and DTW Algorithm for Home Automation

Cited by 15 publications

References 15 publications

Enhancing speech emotion recognition with deep learning using multi-feature stacking and data augmentation

Enhancing speech emotion recognition with deep learning using multi-feature stacking and data augmentation

Mel Frequency Cepstral Coefficient and its Applications: A Review

An Operator-Based Approach for Modeling Influence Diffusion in Complex Social Networks

Contact Info

Product

Resources

About