Some Commonly Used Speech Feature Extraction Algorithms

Alim, Sabur Ajibola; Rashid, Nahrul Khair Alang Md

doi:10.5772/intechopen.80419

Cited by 54 publications

(8 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Mel-frequency cepstral coefficients (MFCCs) are widely used to extract features for voice-based authentication [ 13 , 14 , 15 , 16 , 17 , 18 ]. MFCCs are obtained by extracting features from the audio signal, and when used as input to the base model, they produce much better performance than when directly considering raw audio signals as input.…”

Section: Literature Reviewsmentioning

confidence: 99%

ArtiLock: Smartphone User Identification Based on Physiological and Behavioral Features of Monosyllable Articulation

Wong

Huang

Chen

et al. 2023

Sensors

View full text Add to dashboard Cite

Although voice authentication is generally secure, voiceprint-based authentication methods have the drawback of being affected by environmental noise, long passphrases, and large registered samples. Therefore, we present a breakthrough idea for smartphone user authentication by analyzing articulation and integrating the physiology and behavior of the vocal tract, tongue position, and lip movement to expose the uniqueness of individuals while making utterances. The key idea is to leverage the smartphone speaker and microphone to simultaneously transmit and receive speech and ultrasonic signals, construct identity-related features, and determine whether a single utterance is a legitimate user or an attacker. Physiological authentication methods prevent other users from copying or reproducing passwords. Compared to other types of behavioral authentication, the system is more accurately able to recognize the user’s identity and adapt accordingly to environmental variations. The proposed system requires a smaller number of samples because single utterances are utilized, resulting in a user-friendly system that resists mimicry attacks with an average accuracy of 99% and an equal error rate of 0.5% under the three different surroundings.

show abstract

Section: Literature Reviewsmentioning

confidence: 99%

ArtiLock: Smartphone User Identification Based on Physiological and Behavioral Features of Monosyllable Articulation

Wong

Huang

Chen

et al. 2023

Sensors

View full text Add to dashboard Cite

show abstract

“…Autocorrelation coefficients are aliased in conventional linear prediction. The susceptibility of LPC estimates to quantization noise is high, so they are not well suited for generalization [19].…”

Section: Introductionmentioning

confidence: 99%

“…It can derive information from latent signals in both the time and frequency domains at the same time. Many wavelets are orthogonal, which is an outstanding feature for compact signal representation [19] [20]. The wavelet transform breaks down a signal into a set of simple functions known as wavelets.…”

Section: Introductionmentioning

confidence: 99%

“…LPC are a type of speech features that imitates the human vocal tract. It estimates the concentration and frequency of the left-over residue by approximating the formants, removing their effects from the speech signal, and evaluating the speech signal [18] [19]. Each sample of the signal is stated to be a direct incorporation of previous samples in the result.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

G-Cocktail: An Algorithm to Address Cocktail Party Problem of Gujarati Language using CatBoost

Gupta

Singh²,

Singh

2021

Preprint

View full text Add to dashboard Cite

The pandemic caused due to COVID-19, has seen things going online. People tired of typing prefer to give voice commands. Most of the voice based applications and devices are not prepared to handle the native languages. Moreover, in a party environment it is difficult to identify a voice command as there are many speakers. The proposed work addresses the Cocktail party problem of Indian language, Gujarati. The voice response systems like, Siri, Alexa, Google Assistant as of now work on single voice command. The proposed algorithm G- Cocktail would help these applications to identify command given in Gujarati even from a mixed voice signal. Benchmark Dataset is taken from Microsoft and Linguistic Data Consortium for Indian Languages(LDC-IL) comprising single words and phrases. G-Cocktail utilizes the power of CatBoost algorithm to classify and identify the voice. Voice print of the entire sound files is created using Pitch, and Mel Frequency Cepstral Coefficients (MFCC). Seventy percent of the voice prints are used to train the network and thirty percent for testing. The proposed work is tested and compared with K-means, Naïve Bayes, and LightGBM.

show abstract

“…To extract the vocal features like human ear the algorithm should replicate the human acoustics. MFCC, LPC, Linear Prediction Cepstral Coefficients (LPCC), Linear Spectral Frequencies (LSF), Perceptual Linear prediction (PLP) imitate the human hearing and speaking tract and give relevant features [10]. MFCC filters frequencies linearly at low frequencies and logarithmically at high frequencies to preserve the phonetically vital properties of the speech signal.…”

Section: Introductionmentioning

confidence: 99%

ERIL: An Algorithm for Emotion Recognition From Indian Languages Using Machine Learning

Mehra¹,

Jain

2021

Preprint

View full text Add to dashboard Cite

For a human interaction with machine, it is important that it understand the mood of the speaker. Until now we train machines on neutral speeches or utterances. The mood of a person would affect their performances. Deciphering human mood is challenging for the machines, as human can create fourteen distinct sound in a second. For a machine to understand the human behaviour, it should understand the acoustic abilities of the human ear. Mel Frequency Cepstral Coefficients (MFCC) and Linear Prediction coefficients (LPC) can replicate human auditory system. The proposed model Emotion Recognition from Indian Languages (ERIL) extracts emotions like fear, anger, surprise, sadness, happiness, and neutral. ERIL first pre-processes the voice signal, extracts selective MFCC, LPC, pitch, and voice quality features, then classifies the speech using Catboost. ERIL is a multilingual emotion classifier, it is independent of any language. We checked it on Hindi, Gujarati, Marathi, Punjabi, Bangla, Tamil, Oriya, and Telugu. We recorded a speech dataset of various emotions in these languages. ERIL is compared to other benchmark classifiers.

show abstract

Some Commonly Used Speech Feature Extraction Algorithms

Cited by 54 publications

References 30 publications

ArtiLock: Smartphone User Identification Based on Physiological and Behavioral Features of Monosyllable Articulation

ArtiLock: Smartphone User Identification Based on Physiological and Behavioral Features of Monosyllable Articulation

G-Cocktail: An Algorithm to Address Cocktail Party Problem of Gujarati Language using CatBoost

ERIL: An Algorithm for Emotion Recognition From Indian Languages Using Machine Learning

Contact Info

Product

Resources

About