MSP-MFCC: Energy-Efficient MFCC Feature Extraction Method With Mixed-Signal Processing Architecture for Wearable Speech Recognition Applications

Li, Qin; Yang, Yuze; Lan, Tianxiang; Zhu, Huifeng; Wei, Qi; Qiao, Fei; Liu, Xinjun; Yang, Huazhong

doi:10.1109/access.2020.2979799

Cited by 45 publications

(18 citation statements)

References 28 publications

(51 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, adjustments are required to translate the FFT frequencies to this non-linear function [26]. This is done through passing signal through the Mel Filter Banks in order to transform it to the Mel Spectrum [27]. The filter is realized by overlapping band-pass filters to create the required warped axis.…”

Section: Pre-emphasismentioning

confidence: 99%

“…The filter is realized by overlapping band-pass filters to create the required warped axis. Next, the logarithm of the signal is taken, this brings the data values closer and less sensitive to the slight variations in the input signal [27]. Finally we perform a Discrete Cosine Transform (DCT) to take the resultant signal to the Cepstrum domain.…”

Section: Pre-emphasismentioning

confidence: 99%

See 1 more Smart Citation

Low-Power Audio Keyword Spotting Using Tsetlin Machines

Lei

Rahman

Shafik

et al. 2021

JLPEA

View full text Add to dashboard Cite

The emergence of artificial intelligence (AI) driven keyword spotting (KWS) technologies has revolutionized human to machine interaction. Yet, the challenge of end-to-end energy efficiency, memory footprint and system complexity of current neural network (NN) powered AI-KWS pipelines has remained ever present. This paper evaluates KWS utilizing a learning automata powered machine learning algorithm called the Tsetlin Machine (TM). Through significant reduction in parameter requirements and choosing logic over arithmetic-based processing, the TM offers new opportunities for low-power KWS while maintaining high learning efficacy. In this paper, we explore a TM-based keyword spotting (KWS) pipeline to demonstrate low complexity with faster rate of convergence compared to NNs. Further, we investigate the scalability with increasing keywords and explore the potential for enabling low-power on-chip KWS.

show abstract

Section: Pre-emphasismentioning

confidence: 99%

Section: Pre-emphasismentioning

confidence: 99%

Low-Power Audio Keyword Spotting Using Tsetlin Machines

Lei

Rahman

Shafik

et al. 2021

JLPEA

View full text Add to dashboard Cite

show abstract

“…Our study conclude that the techniques used for sentiment analysis from speech thus far, work better on a larger dataset and on single language. There is no historical evidence of emotion extraction from multilingual speech data of Indian languages [42][43][44][45][46][47][48][49][50][51][52][53][54][55][56][57]. Our research is based on 8 Indian languages: Hindi, Gujarati, Marathi, Punjabi, Bangla, Tamil, Oriya, and Telugu.…”

Section: Introductionmentioning

confidence: 99%

ERIL: An Algorithm for Emotion Recognition From Indian Languages Using Machine Learning

Mehra¹,

Jain

2021

Preprint

View full text Add to dashboard Cite

For a human interaction with machine, it is important that it understand the mood of the speaker. Until now we train machines on neutral speeches or utterances. The mood of a person would affect their performances. Deciphering human mood is challenging for the machines, as human can create fourteen distinct sound in a second. For a machine to understand the human behaviour, it should understand the acoustic abilities of the human ear. Mel Frequency Cepstral Coefficients (MFCC) and Linear Prediction coefficients (LPC) can replicate human auditory system. The proposed model Emotion Recognition from Indian Languages (ERIL) extracts emotions like fear, anger, surprise, sadness, happiness, and neutral. ERIL first pre-processes the voice signal, extracts selective MFCC, LPC, pitch, and voice quality features, then classifies the speech using Catboost. ERIL is a multilingual emotion classifier, it is independent of any language. We checked it on Hindi, Gujarati, Marathi, Punjabi, Bangla, Tamil, Oriya, and Telugu. We recorded a speech dataset of various emotions in these languages. ERIL is compared to other benchmark classifiers.

show abstract

“…The ASR accuracy obtained in laboratory environments is quite high, but once the recognition system is placed in a real background, the recognition rate gets roughly low. Several embedded voice recognition systems have been reported and some of them are implemented in Field Programmable Gate Arrays (FPGAs) [8][9][10] or in Digital Signal Processors (DSPs) [11,12], all of them with a modest accuracy rate. ASR state-of-the-art systems are linking the performance to reasonable and controlled training conditions.…”

Section: Introductionmentioning

confidence: 99%

Real Time Speech Recognition based on PWP Thresholding and MFCC using SVM

Helali

Hajaiej

Chérif

2020

Eng. Technol. Appl. Sci. Res.

View full text Add to dashboard Cite

The real-time performance of Automatic Speech Recognition (ASR) is a big challenge and needs high computing capability and exhaustive memory consumption. Getting a robust performance against inevitable various difficult situations such as speaker variations, accents, and noise is a tedious task. It’s crucial to expand new and efficient approaches for speech signal extraction features and pre-processing. In order to fix the high dependency issue related to processing succeeding steps in ARS and enhance the extracted features’ quality, noise robustness can be solved within the ARS extraction block feature, removing implicitly the need for further additional specific compensation parameters or data collection. This paper proposes a new robust acoustic extraction approach development based on a hybrid technique consisting of Perceptual Wavelet Packet (PWP) and Mel Frequency Cepstral Coefficients (MFCCs). The proposed system was implemented on a Rasberry Pi board and its performance was checked in a clean environment, reaching 99% average accuracy. The recognition rate was improved (from 80% to 99%) for the majority of Signal-to-Noise Ratios (SNRs) under real noisy conditions for positive SNRs and considerably improved results especially for negative SNRs.

show abstract

MSP-MFCC: Energy-Efficient MFCC Feature Extraction Method With Mixed-Signal Processing Architecture for Wearable Speech Recognition Applications

Cited by 45 publications

References 28 publications

Low-Power Audio Keyword Spotting Using Tsetlin Machines

Low-Power Audio Keyword Spotting Using Tsetlin Machines

ERIL: An Algorithm for Emotion Recognition From Indian Languages Using Machine Learning

Real Time Speech Recognition based on PWP Thresholding and MFCC using SVM

Contact Info

Product

Resources

About