Noise robust voice activity detection based on periodic to aperiodic component ratio

Ishizuka, Kentaro; Nakatani, Tomohiro; Fujimoto, Masakiyo; Miyazaki, Noboru

doi:10.1016/j.specom.2009.08.003

Cited by 56 publications

(41 citation statements)

References 72 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The maximum value can be employed to detect periodicity [30]. Normalizing the maximum value based on an estimate of the aperiodic components increases the robustness of the feature as described in [25] and similarly in [35].…”

Section: Pitch and Harmonicitymentioning

confidence: 99%

Features for voice activity detection: a comparative analysis

Graf

Herbig²,

Buck³

et al. 2015

EURASIP J. Adv. Signal Process.

View full text Add to dashboard Cite

In many speech signal processing applications, voice activity detection (VAD) plays an essential role for separating an audio stream into time intervals that contain speech activity and time intervals where speech is absent. Many features that reflect the presence of speech were introduced in literature. However, to our knowledge, no extensive comparison has been provided yet. In this article, we therefore present a structured overview of several established VAD features that target at different properties of speech. We categorize the features with respect to properties that are exploited, such as power, harmonicity, or modulation, and evaluate the performance of some dedicated features. The importance of temporal context is discussed in relation to latency restrictions imposed by different applications. Our analyses allow for selecting promising VAD features and finding a reasonable trade-off between performance and complexity.

show abstract

Section: Pitch and Harmonicitymentioning

confidence: 99%

Features for voice activity detection: a comparative analysis

Graf

Herbig²,

Buck³

et al. 2015

EURASIP J. Adv. Signal Process.

View full text Add to dashboard Cite

show abstract

“…The second stage is the recognition of whether the detected voice is a part of conversational utterance or not. Numerous automatic human voice detection algorithms have been proposed, including ones based on periodicity [13], power ratio in the frequency domain [14], and frequency deviation [15]- [17]. To correctly recognize the conversation period and the end of the conversation, the voice of the talking partner, not only the target person of the estimation, has to be detected.…”

Section: Automatic Conversational Voice Detectionmentioning

confidence: 99%

Improvement of Interruptibility Estimation during PC Work by Reflecting Conversation Status

Hashimoto

Tanaka

Aoki

et al. 2014

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYFrequently interrupting someone who is busy will decrease his or her productivity. To minimize this risk, a number of interruptibility estimation methods based on PC activity such as typing or mouse clicks have been developed. However, these estimation methods do not take account of the effect of conversations in relation to the interruptibility of office workers engaged in intellectual activities such as scientific research. This study proposes an interruptibility estimation method that takes account of the conversation status. Two conversation indices, "In conversation" and "End of conversation" were used in a method that we developed based on our analysis of 50 hours worth of recorded activity. Experiments, using the conversation status as judged by the Wizard-of-OZ method, demonstrated that the estimation accuracy can be improved by the two indices. Furthermore, an automatic conversation status recognition system was developed to replace the Wizard-of-OZ procedure. The results of using it for interruptibility estimation suggest the effectiveness of the automatically recognized conversation status.

show abstract

“…These methods utilize the facts that vowels exhibit strong (quasi-)periodicity and apply it to discriminate speech from silence. Periodicity based approaches are usually more robust to noisy environments, however they require more computational effort than the energy-based ones, (Ishizuka et al, 2010). Finally, in the Broadcast News field, most systems discriminate between acoustic classes like speech, music, music and speech, and silence.…”

Section: Front-end Features and Preprocessing Stepsmentioning

confidence: 99%