Improved voice activity detection based on a smoothed statistical likelihood ratio

Cho, Y.D.; Al-Naimi, K.; Kondoz, A.M.

doi:10.1109/icassp.2001.941020

Cited by 47 publications

(31 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…There are many voice activity detectors (VADs), silence detectors, and turn-taking options in the literature [16]- [19]. We used a combination of volume, spectral energy, fundamental frequency (F 0 ), and spectral flatness for creating a predictor for speech segments.…”

Section: Segmentationmentioning

confidence: 99%

RESONATE: Reverberation environment simulation for improved classification of speech models

Dickerson

Hoque

Asare

et al. 2014

IPSN-14 Proceedings of the 13th International Symposium on Information Processing in Sensor Networks

View full text Add to dashboard Cite

Abstract-Home monitoring systems currently gather information about peoples activities of daily living and information regarding emergencies, however they currently lack the ability to track speech. Practical speech analysis solutions are needed to help monitor ongoing conditions such as depression, as the amount of social interaction and vocal affect is important for assessing mood and well-being. Although there are existing solutions that classify the identity and the mood of a speaker, when the acoustic signals are captured in reverberant environments they perform poorly. In this paper, we present a practical reverberation compensation method called RESONATE, which uses simulated room impulse responses to adapt a training corpus for use in multiple real reverberant rooms. We demonstrate that the system creates robust classifiers that perform within 5 -10% of baseline accuracy of non-reverberant environments. We demonstrate and evaluate the performance of this matched condition strategy using a public dataset, and also in controlled experiments with six rooms, and two long-term and uncontrolled real deployments. We offer a practical implementation that performs collection, feature extraction, and classification on-node, and training and simulation of training sets on a base station or cloud service.

show abstract

Section: Segmentationmentioning

confidence: 99%

RESONATE: Reverberation environment simulation for improved classification of speech models

Dickerson

Hoque

Asare

et al. 2014

IPSN-14 Proceedings of the 13th International Symposium on Information Processing in Sensor Networks

View full text Add to dashboard Cite

show abstract

“…Note that this method inherently incorporates the interframe correlation in determining the detection thresholds so that it can be combined with other hangover schemes such as the HMM [7] and the smoothing of the LRs [6].…”

Section: ) P H N H H N H H N H P H N H H N H H N Hmentioning

confidence: 99%

Statistical Model-Based Voice Activity Detection Based on Second-Order Conditional MAP with Soft Decision

Chang¹

2012

ETRI J

View full text Add to dashboard Cite

In this paper, we propose a novel approach to statistical model-based voice activity detection (VAD) that incorporates a second-order conditional maximum a posteriori (CMAP) criterion. As a technical improvement for the first-order CMAP criterion in [1], we consider both the current observation and the voice activity decision in the previous two frames to take full consideration of the interframe correlation of voice activity. This is clearly different from the previous approach [1] in that we employ the voice activity decisions in the second-order (previous two frames) CMAP, which has quadruple thresholds with an additional degree of freedom, rather than the first-order (previous single frame). Also, a softdecision scheme is incorporated, resulting in time-varying thresholds for further performance improvement. Experimental results show that the proposed algorithm outperforms the conventional CMAP-based VAD technique under various experimental conditions.

show abstract

“…The noise estimation procedure used in the speech enhancement algorithms is similar to that described in Section 3. The noise power spectrum is initialized with the first frame's data (11) and then recursively updated (10) during non-speech frames determined by the VAD. The frame size and shift used for speech enhancement are 25 and 10 ms respectively, same as those used for extracting the 39-element MFCC_D_A_E feature vector [15] from the post-enhanced signal.…”

Section: Asr Experimentsmentioning

confidence: 99%

“…Variants of this noise-robust VAD to increase weak speech onset and offset detection have been proposed. For example, [11] used a smoothed LR, while [12] used multiple observations of the short-time DFT feature vector to replace the hangover scheme in [10]. For these VADs, the high LRs of strong speech frames aid the detection of weak neighboring speech frames.…”

Section: Introductionmentioning

confidence: 99%

Voice activity detection using harmonic frequency components in likelihood ratio test

Tan

Borgström

Alwan

2010

2010 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

This paper proposes a new statistical model-based likelihood ratio test (LRT) VAD to obtain reliable speech / non-speech decisions. In the proposed method, the likelihood ratio (LR) is calculated differently for voiced frames, as opposed to unvoiced frames: only DFT bins containing harmonic spectral peaks are selected for LR computation. To evaluate the new VAD's effectiveness in improving the noiserobustness of ASR, its decisions are applied to preprocessing techniques such as non-linear spectral subtraction, minimum mean square error short-time spectral amplitude estimator, and frame dropping. From the ASR experiments conducted on the Aurora2 database, the proposed harmonic frequency-based LRTs give better results than conventional LRT-based VADs and the standard G.729B and ETSI AMR VADs.

show abstract

Improved voice activity detection based on a smoothed statistical likelihood ratio

Cited by 47 publications

References 4 publications

RESONATE: Reverberation environment simulation for improved classification of speech models

RESONATE: Reverberation environment simulation for improved classification of speech models

Statistical Model-Based Voice Activity Detection Based on Second-Order Conditional MAP with Soft Decision

Voice activity detection using harmonic frequency components in likelihood ratio test

Contact Info

Product

Resources

About