2021
DOI: 10.1049/tje2.12082
|View full text |Cite
|
Sign up to set email alerts
|

How many Mel‐frequency cepstral coefficients to be utilized in speech recognition? A study with the Bengali language

Abstract: Speech‐related research has a wide range of applications. Most speech‐related researches employ Mel‐frequency cepstral coefficients (MFCCs) as acoustic features. However, finding the optimum number of MFCCs is an active research question. MFCC‐based speech classification was performed for both vowels and words in the Bengali language. As for the classification model, deep neural network (DNN) with Adam optimizer was used. The performances were measured with five different performance metrics, namely confusion … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 16 publications
(9 citation statements)
references
References 36 publications
0
4
0
Order By: Relevance
“…Inspired by biological neurons, spiking neuron networks (SNNs) are very popular in deep learning (DL). As a widely used neuronal model in SNNs, the Hodgkin-Huxley (HH) model describes the electrical behavior of giant squid axon membranes, and some biological spiking neuron models are based on it [8]. To solve the problem of computationally overloaded HH neuron model, leaky integrate-firing (LIF), regular spikes (RS, also called Izhikevich model), and other neuron models have been proposed.…”
Section: Introductionmentioning
confidence: 99%
“…Inspired by biological neurons, spiking neuron networks (SNNs) are very popular in deep learning (DL). As a widely used neuronal model in SNNs, the Hodgkin-Huxley (HH) model describes the electrical behavior of giant squid axon membranes, and some biological spiking neuron models are based on it [8]. To solve the problem of computationally overloaded HH neuron model, leaky integrate-firing (LIF), regular spikes (RS, also called Izhikevich model), and other neuron models have been proposed.…”
Section: Introductionmentioning
confidence: 99%
“…This was then processed using a Hamming Window (20) of length 882, followed by Matlab's AudioFeatureExtractor function to determine the first 18 Mel-Frequency Cepstral Coefficients (MFCCs), which are representations of the power spectrum of the sound (21), for each group. Standard numbers of MFCCs used in similar studies vary between 13 and 25 (22). The number 18 was chosen here in order to match the number of features contributed from each IMU sensor, to ensure the system does not initially weight any one sensor more heavily than the others (weights will be determined and refined during training).…”
Section: Processingmentioning
confidence: 99%
“…Studies on the identification of emotions from Bangla speech data are scarce [4], [24]- [27]. 25 MFCCs were suggested by researchers who investigated the optimum number of MFCCs for emotion recognition in speech data in [4].…”
Section: Motivationmentioning
confidence: 99%