2016
DOI: 10.1109/lsp.2015.2495219
|View full text |Cite
|
Sign up to set email alerts
|

Voice Activity Detection: Merging Source and Filter-based Information

Abstract: Voice Activity Detection (VAD) refers to the problem of distinguishing speech segments from background noise. Numerous approaches have been proposed for this purpose. Some are based on features derived from the power spectral density, others exploit the periodicity of the signal. The goal of this paper is to investigate the joint use of source and filter-based features. Interestingly, a mutual information-based assessment shows superior discrimination power for the source-related features, especially the propo… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
33
0
11

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 60 publications
(45 citation statements)
references
References 41 publications
(67 reference statements)
1
33
0
11
Order By: Relevance
“…The binary classification problem is described as Regarding the feature extraction part of this paper, it uses a relatively small number of features that are fed as input to an ensemble classification schema. We tested Clarity, that is commonly used in voice activity detection [20], [21] as a candidate feature for emotion recognition. The reason for that choice is that emotion recognition is a complex, versatile task, so alternative features may capture supplementary aspects of emotion expression.…”
Section: Algorithm 1 the Random Forest Algorithmmentioning
confidence: 99%
“…The binary classification problem is described as Regarding the feature extraction part of this paper, it uses a relatively small number of features that are fed as input to an ensemble classification schema. We tested Clarity, that is commonly used in voice activity detection [20], [21] as a candidate feature for emotion recognition. The reason for that choice is that emotion recognition is a complex, versatile task, so alternative features may capture supplementary aspects of emotion expression.…”
Section: Algorithm 1 the Random Forest Algorithmmentioning
confidence: 99%
“…The speaker models were then adapted from their respective gender's UBM with a relevance factor of 16. For the i-vector framework, same UBMs and the pooled training data were used to train the total variability matrix in 20 iterations, and then, 100 dimensional i-vectors were Table 1 Male speaker verification results of GMM-UBM method in terms of percent EER (minDCF) for the proposed algorithm, Drugman's VAD method [27], and Rangachari's noise tracking method [21]. The last columns show the relative percent EER reduction rates compared to Drugman's VAD and Rangachari's method, respectively extracted from each utterance.…”
Section: Clarity Level and Vad Outputmentioning
confidence: 99%
“…One of these methods was the noise tracking algorithm proposed in [21] (called Rangachari's method from here on). This algorithm works on the frequency bins of the conventional Table 2 Female speaker verification results of GMM-UBM method in terms of percent EER (minDCF) for the proposed algorithm, Drugman's VAD method [27], and Rangachari's noise tracking method [21]. The last columns show the relative percent EER reduction rates compared to Drugman spectrum, but, to make a fair comparison, it was modified to work on the mel spectrum, similar to the algorithm proposed in this paper.…”
Section: Clarity Level and Vad Outputmentioning
confidence: 99%
See 2 more Smart Citations