2020
DOI: 10.1109/access.2020.2979799
|View full text |Cite
|
Sign up to set email alerts
|

MSP-MFCC: Energy-Efficient MFCC Feature Extraction Method With Mixed-Signal Processing Architecture for Wearable Speech Recognition Applications

Abstract: Feature extraction is an essential part of automatic speech recognition (ASR) to compress raw speech data and enhance features, where conventional implementation methods based on the digital domain have encountered energy consumption and processing speed bottlenecks. Thus, we propose a Mixed-Signal Processing (MSP) architecture to efficiently extract Mel-Frequency Cepstrum Coefficients (MFCC) features. We design MSP-MFCC to pre-process speech signals in the analog domain, which significantly reduces the cost o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
15
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 45 publications
(18 citation statements)
references
References 28 publications
(51 reference statements)
0
15
0
Order By: Relevance
“…Therefore, adjustments are required to translate the FFT frequencies to this non-linear function [26]. This is done through passing signal through the Mel Filter Banks in order to transform it to the Mel Spectrum [27]. The filter is realized by overlapping band-pass filters to create the required warped axis.…”
Section: Pre-emphasismentioning
confidence: 99%
See 1 more Smart Citation
“…Therefore, adjustments are required to translate the FFT frequencies to this non-linear function [26]. This is done through passing signal through the Mel Filter Banks in order to transform it to the Mel Spectrum [27]. The filter is realized by overlapping band-pass filters to create the required warped axis.…”
Section: Pre-emphasismentioning
confidence: 99%
“…The filter is realized by overlapping band-pass filters to create the required warped axis. Next, the logarithm of the signal is taken, this brings the data values closer and less sensitive to the slight variations in the input signal [27]. Finally we perform a Discrete Cosine Transform (DCT) to take the resultant signal to the Cepstrum domain.…”
Section: Pre-emphasismentioning
confidence: 99%
“…Our study conclude that the techniques used for sentiment analysis from speech thus far, work better on a larger dataset and on single language. There is no historical evidence of emotion extraction from multilingual speech data of Indian languages [42][43][44][45][46][47][48][49][50][51][52][53][54][55][56][57]. Our research is based on 8 Indian languages: Hindi, Gujarati, Marathi, Punjabi, Bangla, Tamil, Oriya, and Telugu.…”
Section: Introductionmentioning
confidence: 99%
“…The ASR accuracy obtained in laboratory environments is quite high, but once the recognition system is placed in a real background, the recognition rate gets roughly low. Several embedded voice recognition systems have been reported and some of them are implemented in Field Programmable Gate Arrays (FPGAs) [8][9][10] or in Digital Signal Processors (DSPs) [11,12], all of them with a modest accuracy rate. ASR state-of-the-art systems are linking the performance to reasonable and controlled training conditions.…”
Section: Introductionmentioning
confidence: 99%