2010 IEEE International Conference on Acoustics, Speech and Signal Processing 2010
DOI: 10.1109/icassp.2010.5495611
|View full text |Cite
|
Sign up to set email alerts
|

Voice activity detection using harmonic frequency components in likelihood ratio test

Abstract: This paper proposes a new statistical model-based likelihood ratio test (LRT) VAD to obtain reliable speech / non-speech decisions. In the proposed method, the likelihood ratio (LR) is calculated differently for voiced frames, as opposed to unvoiced frames: only DFT bins containing harmonic spectral peaks are selected for LR computation. To evaluate the new VAD's effectiveness in improving the noiserobustness of ASR, its decisions are applied to preprocessing techniques such as non-linear spectral subtraction,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
20
0

Year Published

2012
2012
2020
2020

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 34 publications
(20 citation statements)
references
References 14 publications
0
20
0
Order By: Relevance
“…Spectral domain VNV detection methods exploit the spectral harmonicity in voiced regions. Harmonic peaks in the amplitude spectrum of voiced speech regions are usually preserved in noisy speech ( Tan et al, 2010 ). Spectral harmonicity is exploited in developing the robust automatic speech recognition ( Beh & Ko, 2003 ) and speech enhancement ( Plapous et al, 2005 ) http://dx.doi.org/10.1016/j.specom.2016.01.008 0167-6393/© 2016 Elsevier B.V. All rights reserved.…”
Section: Introductionmentioning
confidence: 99%
“…Spectral domain VNV detection methods exploit the spectral harmonicity in voiced regions. Harmonic peaks in the amplitude spectrum of voiced speech regions are usually preserved in noisy speech ( Tan et al, 2010 ). Spectral harmonicity is exploited in developing the robust automatic speech recognition ( Beh & Ko, 2003 ) and speech enhancement ( Plapous et al, 2005 ) http://dx.doi.org/10.1016/j.specom.2016.01.008 0167-6393/© 2016 Elsevier B.V. All rights reserved.…”
Section: Introductionmentioning
confidence: 99%
“…Statistical models are superior to rule-based classification algorithms when the segments are not clearly demarcated. There are several popular statistical models in VAD systems such as the likelihood ratio test (LRT) [15]. Tan et al modified the LRT-based model by selecting discrete Fourier transform bins that consist of harmonic spectral peaks to determine the likelihood ratio.…”
Section: Discriminative Features and Classificationmentioning
confidence: 99%
“…Furthermore, it has been shown that the spectro-temporal analysis of the Fourier spectrogram can capture prominent acoustic "textures", such as pitch, harmonicity, formant, amplitude modulation (AM) and frequency modulation (FM) [34]. The pitch, harmonicity and formants are spectrum-related features have been considered in VAD algorithms [3], [21], [22]. The AM encodes the long-term variations of the envelope of the acoustic signal and was considered as in [15]- [18], [22].…”
Section: Introductionmentioning
confidence: 99%
“…Moreover, pitch (fundamental frequency) and harmonics of a voiced sound are perceptually important to human hearing [10], [19], [20]. This property leads to the approach of adopting harmonic-related features in VAD algorithms [3], [21], [22].…”
Section: Introductionmentioning
confidence: 99%