2013
DOI: 10.1109/jssc.2013.2258827
|View full text |Cite
|
Sign up to set email alerts
|

A 2.3 nJ/Frame Voice Activity Detector-Based Audio Front-End for Context-Aware System-On-Chip Applications in 32-nm CMOS

Abstract: Advanced human-machine interfaces require improved embedded sensors that can seamlessly interact with the user. Voice-based communication has emerged as a promising interface for next generation mobile, automotive and hands-free devices. Presented here is such an audio front-end with Voice Activity Detection (VAD) hardware targeted for low-power embedded SoCs, featuring a 512 pt FFT, programmable filters, noise floor estimator and a decision engine which has been fabricated in 32 nm CMOS. The dual-, dual-frequ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
21
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 50 publications
(21 citation statements)
references
References 8 publications
0
21
0
Order By: Relevance
“…The specifications for input-referred noise and gain strongly depend on the input signal level, which depend on the type and make of the microphones used in the system. The active microphones used by SotA VADs consume 20 -50 µW [18,19] depends on the frequency of classification which in a typical VAD system is every 10-16ms [8][9][10]. This averaging is implemented as LPF with a f -3dB of 16 Hz.…”
Section: A Vad System Architecturementioning
confidence: 99%
See 2 more Smart Citations
“…The specifications for input-referred noise and gain strongly depend on the input signal level, which depend on the type and make of the microphones used in the system. The active microphones used by SotA VADs consume 20 -50 µW [18,19] depends on the frequency of classification which in a typical VAD system is every 10-16ms [8][9][10]. This averaging is implemented as LPF with a f -3dB of 16 Hz.…”
Section: A Vad System Architecturementioning
confidence: 99%
“…VAD systems distinguish speech from non-speech in different background noise contexts for varying signal to acoustic noise ratios (SANR). SotA VAD systems [8][9][10] extract complex features like Mel-Frequency Cepstral Coefficients, DCT etc. to differentiate speech from nonspeech.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Voice activity detectors based on mel-scaled mean energy features [23], Fourier coefficients [24], and zero-crossing frequency [25] have already been implemented. These voice activity detectors demonstrate good detection accuracy with at an energy cost which is orders of magnitude lower than their classical digital counterparts.…”
Section: A Choice Of Feature Enhancing Filtersmentioning
confidence: 99%
“…Afterwards, a classification stage, such as a feed-forward neural network or a decision tree, decides whether the data correspond to the human voice or not. The use of this architecture in portable devices is restricted by the power consumption of digital circuits, which may need high-capacity batteries [9]. Because of this battery-life limitation, a second approach have been recently proposed.…”
Section: Introductionmentioning
confidence: 99%