2015
DOI: 10.1186/s13634-015-0277-z
|View full text |Cite
|
Sign up to set email alerts
|

Features for voice activity detection: a comparative analysis

Abstract: In many speech signal processing applications, voice activity detection (VAD) plays an essential role for separating an audio stream into time intervals that contain speech activity and time intervals where speech is absent. Many features that reflect the presence of speech were introduced in literature. However, to our knowledge, no extensive comparison has been provided yet. In this article, we therefore present a structured overview of several established VAD features that target at different properties of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
28
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 52 publications
(28 citation statements)
references
References 49 publications
0
28
0
Order By: Relevance
“…A batch of training data comprises 128 sequences, with each sequence consisting of 20 feature vectors. The feature vector x t is computed from the observed signals every millisecond according to (6). The dataset consists of recordings of a desired target speaker, up to 4 simultaneously active interferers, and babble noise in the background.…”
Section: Implementation and Scenariosmentioning
confidence: 99%
See 1 more Smart Citation
“…A batch of training data comprises 128 sequences, with each sequence consisting of 20 feature vectors. The feature vector x t is computed from the observed signals every millisecond according to (6). The dataset consists of recordings of a desired target speaker, up to 4 simultaneously active interferers, and babble noise in the background.…”
Section: Implementation and Scenariosmentioning
confidence: 99%
“…In order to achieve a comparable performance both on small network sizes and small amounts of training data, the selection of feature vectors is indispensable. Classical approaches for Voice Activity Detection (VAD) are typically single-channel methods exploiting distinctive properties of speech signals like stationarity, harmonic structure and spectral envelopes in order to differentiate between speech and background noise [5,6]. These VAD methods, however, cannot be used to differentiate between a target speaker and interfering speech sources as the proposed TAD does.…”
Section: Introductionmentioning
confidence: 99%
“…Voice Activity Detection (VAD) is widely researched in audio signal processing and used for audio conferencing, speech encoding, speech recognition, and speaker recognition [17,26]. VAD methods detect voice activity (primarily speech) from a noisy audio signal [16,24,29]. Video content-based camera motion analysis methods make use of template matching [1] and optical flow [6].…”
Section: Focused Interaction Datasetmentioning
confidence: 99%
“…It is a fact that SNR can be high at a single frequency point when speech (especially voiced frame) is present, even though the overall SNR of a signal is low (such as 0 dB) [13]. To each frequency point, the entropy of the R continuous frames before the current frame will abruptly become small when speech suddenly appears in the current frame.…”
Section: Proposed Approachmentioning
confidence: 99%