Our system is currently under heavy load due to increased usage. We're actively working on upgrades to improve performance. Thank you for your patience.
2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221)
DOI: 10.1109/icassp.2001.941020
|View full text |Cite
|
Sign up to set email alerts
|

Improved voice activity detection based on a smoothed statistical likelihood ratio

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
30
0
1

Publication Types

Select...
3
3
3

Relationship

0
9

Authors

Journals

citations
Cited by 47 publications
(31 citation statements)
references
References 4 publications
0
30
0
1
Order By: Relevance
“…There are many voice activity detectors (VADs), silence detectors, and turn-taking options in the literature [16]- [19]. We used a combination of volume, spectral energy, fundamental frequency (F 0 ), and spectral flatness for creating a predictor for speech segments.…”
Section: Segmentationmentioning
confidence: 99%
“…There are many voice activity detectors (VADs), silence detectors, and turn-taking options in the literature [16]- [19]. We used a combination of volume, spectral energy, fundamental frequency (F 0 ), and spectral flatness for creating a predictor for speech segments.…”
Section: Segmentationmentioning
confidence: 99%
“…Note that this method inherently incorporates the interframe correlation in determining the detection thresholds so that it can be combined with other hangover schemes such as the HMM [7] and the smoothing of the LRs [6].…”
Section: ) P H N H H N H H N H P H N H H N H H N Hmentioning
confidence: 99%
“…The noise estimation procedure used in the speech enhancement algorithms is similar to that described in Section 3. The noise power spectrum is initialized with the first frame's data (11) and then recursively updated (10) during non-speech frames determined by the VAD. The frame size and shift used for speech enhancement are 25 and 10 ms respectively, same as those used for extracting the 39-element MFCC_D_A_E feature vector [15] from the post-enhanced signal.…”
Section: Asr Experimentsmentioning
confidence: 99%
“…Variants of this noise-robust VAD to increase weak speech onset and offset detection have been proposed. For example, [11] used a smoothed LR, while [12] used multiple observations of the short-time DFT feature vector to replace the hangover scheme in [10]. For these VADs, the high LRs of strong speech frames aid the detection of weak neighboring speech frames.…”
Section: Introductionmentioning
confidence: 99%