A 2.3 nJ/Frame Voice Activity Detector-Based Audio Front-End for Context-Aware System-On-Chip Applications in 32-nm CMOS

Raychowdhury, Arijit; Tokunaga, Carlos; Beltman, W.M.; Deisher, Michael; Tschanz, James; De, Vivek

doi:10.1109/jssc.2013.2258827

Cited by 50 publications

(21 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The specifications for input-referred noise and gain strongly depend on the input signal level, which depend on the type and make of the microphones used in the system. The active microphones used by SotA VADs consume 20 -50 µW [18,19] depends on the frequency of classification which in a typical VAD system is every 10-16ms [8][9][10]. This averaging is implemented as LPF with a f -3dB of 16 Hz.…”

Section: A Vad System Architecturementioning

confidence: 99%

“…VAD systems distinguish speech from non-speech in different background noise contexts for varying signal to acoustic noise ratios (SANR). SotA VAD systems [8][9][10] extract complex features like Mel-Frequency Cepstral Coefficients, DCT etc. to differentiate speech from nonspeech.…”

Section: Introductionmentioning

confidence: 99%

“…to differentiate speech from nonspeech. The high computational complexity of such features results in large power consumption, typically about 50 -100 µW [8][9][10][11] in addition to the power consumption of the required active microphone. Such a continuous large power consumption is unacceptable for battery powered always-on sensor frontends.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A 90 nm CMOS, Power-Proportional Acoustic Sensing Frontend for Voice Activity Detection

Badami

Lauwereins

Meert

et al. 2016

IEEE J. Solid-State Circuits

View full text Add to dashboard Cite

This work presents a sub-6 µW acoustic front-end for speech/non-speech classification in a voice activity detection (VAD) in 90 nm CMOS. Power consumption of the VAD system is minimized by architectural design around a new Power-Proportional sensing paradigm and the use of machine-learning assisted moderate-precision analog analytics for classification. Power-Proportional sensing allows for hierarchical and context-aware scaling of the frontend's power consumption depending on the complexity of the ongoing information extraction, while the use of analog analytics brings increased power efficiency through switching on/off the computation of individual features depending on the features' usefulness in a particular context. The proposed VAD system reduces the power consumption by 10X as compared to state-of-the-art systems and yet achieves an 89% average hit rate for a 12 dB signal to acoustic noise ratio in babble context, which is at par with software based VAD systems.

show abstract

Section: A Vad System Architecturementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A 90 nm CMOS, Power-Proportional Acoustic Sensing Frontend for Voice Activity Detection

Badami

Lauwereins

Meert

et al. 2016

IEEE J. Solid-State Circuits

View full text Add to dashboard Cite

show abstract

“…Voice activity detectors based on mel-scaled mean energy features [23], Fourier coefficients [24], and zero-crossing frequency [25] have already been implemented. These voice activity detectors demonstrate good detection accuracy with at an energy cost which is orders of magnitude lower than their classical digital counterparts.…”

Section: A Choice Of Feature Enhancing Filtersmentioning

confidence: 99%

Where Analog Meets Digital: Analog?to?Information Conversion and Beyond

Verhelst

Bahai

2015

IEEE Solid-State Circuits Mag.

View full text Add to dashboard Cite

Energy efficiency, long battery life and low latency are some of the key attributes of many emerging ultralow power sensing and monitoring systems. Applications such as always-on reactive sensor systems for natural human-device interfaces and IoT for consumer and industrial applications require ultra-low power designs beyond the promises of state of the art data converters.These devices demand for a new approach to analog-digital system partitioning with the goal of significant overall reduction in energy consumption. Many IoT applications, unlike most multimedia systems, require signal information extraction or signature extraction, rather than full reconstruction of the original sensed waveforms. Under these conditions, Nyquist rate sampling may no longer offer the optimal digitization scheme. Recent work on alternative sensor digitization strategies target drastic sampling rate reduction in the ADC, while preserving the valuable relevant information (knowledge) present in the sensed signal. This paper aims to give an overview of this emerging field of analog-to-information conversion in light of various sub-Nyquist sampling techniques recently appearing in literature, as well as highlight new opportunities, challenges and applications emerging by such converters. I.Nyquist rate vs. Information rate: Over the last several decades, a growing number of signal processing architects have embraced intensive digital signal processing preceded by a standard analog frontend and analog-to-digital converter. This trend has been exacerbated by the exponential rate of miniaturization in silicon, growing complexity of signal processing algorithms, and more systematic digital design and technology porting compared to analog design in deep submicron technology nodes. The interface between analog and digital signals has as such generally been governed by sampling at or above the Nyquist sampling rate of the analog waveforms. The dimensionality of a bandlimited signal f(t) with a physical bandwidth W over a period of T is 2WT, indicating the number of samples sufficient for perfect digital signal reconstruction. Such sampling at the Nyquist rate of 2W ensures the integrity of the signal represented by samples which are Fourier series coefficients in a Fourier series expansion of function F(w) over fundamental interval [-W W] [1]. The original signal can subsequently be reconstructed by superimposing a set of orthogonal basis functions (sinc functions) weighted by the samples f(nT). Sampling the incoming signal at this rate hence guarantees that no information about the incoming signal is lost without taking into account any heuristic or a priori side information about the signal or its information content other than the physical bandwidth. While sampling at or above Nyquist rate offers a classic and straightforward approach, it can compromise overall power efficiency.

show abstract

“…Afterwards, a classification stage, such as a feed-forward neural network or a decision tree, decides whether the data correspond to the human voice or not. The use of this architecture in portable devices is restricted by the power consumption of digital circuits, which may need high-capacity batteries [9]. Because of this battery-life limitation, a second approach have been recently proposed.…”

Section: Introductionmentioning

confidence: 99%

Time-Encoding-Based Ultra-Low Power Features Extraction Circuit for Speech Recognition Tasks

et al. 2020

View full text Add to dashboard Cite

Current trends towards on-edge computing on smart portable devices requires ultra-low power circuits to be able to make feature extraction and classification tasks of patterns. This manuscript proposes a novel approach for feature extraction operations in speech recognition/voice activity detection tasks suitable for portable devices. Whereas conventional approaches are based on either completely analog or digital structures, we propose a “hybrid” approach by means of voltage-controlled-oscillators. Our proposal makes use of a bank a band-pass filters implemented with ring-oscillators to extract the features (energy within different frequency bands) of input audio signals and digitize them. Afterwards, these data will input a digital classification stage such as a neural network. Ring-oscillators are structures with a digital nature, which makes them highly scalable with the possibility of designing them with minimum length devices. Additionally, due to their inherent phase integration, low-frequency band-pass filters can be implemented without large capacitors. Consequently, we strongly benefit from power consumption and area savings. Finally, our proposal may incorporate the analog-to-digital converter into the structure of the own features extractor circuit to make the full conversion of the raw data when triggered. This supposes a unique advantage with respect to other approaches. The architecture is described and proposed at system-level, along with behavioral simulations made to check whether the performance is the expected one or not. Then the structure is designed with a 65-nm CMOS process to estimate the power consumption and area on a silicon implementation. The results show that our solution is very promising in terms of occupied area with a competitive power consumption in comparison to other state-of-the-art solutions.

show abstract

A 2.3 nJ/Frame Voice Activity Detector-Based Audio Front-End for Context-Aware System-On-Chip Applications in 32-nm CMOS

Cited by 50 publications

References 8 publications

A 90 nm CMOS, Power-Proportional Acoustic Sensing Frontend for Voice Activity Detection

A 90 nm CMOS, Power-Proportional Acoustic Sensing Frontend for Voice Activity Detection

Where Analog Meets Digital: Analog?to?Information Conversion and Beyond

Time-Encoding-Based Ultra-Low Power Features Extraction Circuit for Speech Recognition Tasks

Contact Info

Product

Resources

About