“…The step-size of the sliding window indicates the resolution of the system. For the purpose of VAD, we need to evaluate the following statistical hypotheses: -H 0 : (x 1 Using the log-value of the Generalized Likelihood Ratio Test (GLRT), associated with the defined hypothesis test the distance between the two segments in Fig. 1 is: ( , ; ) log log ( , ; ) ( , ; )…”
Section: Bayessian Information Criterionmentioning
confidence: 99%
“…First, we select a sufficiently big sliding window, model it and its adjacent sub-segments using GΓD instead of GD, and calculate the distance d R associated with the GLRT using (1). Here, as in [9], we are making the assumption that both noise and speech signals have uncorrelated components in the DCT domain.…”
Section: Distbic Using Generalized Gamma Distributionmentioning
confidence: 99%
“…The detection principles of conventional VADs are usually energy-based approaches, which have been proved computationally efficient to such an extent that they allow real-time signal processing [1]. Moreover, these methods work relatively well in high signal to noise ratios (SNR) and for known stationary noise.…”
In this work, we model speech samples with the generalized Gamma distribution and evaluate the efficiency of such modelling for voice activity detection. Using a computationally inexpensive maximum likelihood approach, we employ the Bayesian Information Criterion for identifying the phoneme boundaries in noisy speech.
“…The step-size of the sliding window indicates the resolution of the system. For the purpose of VAD, we need to evaluate the following statistical hypotheses: -H 0 : (x 1 Using the log-value of the Generalized Likelihood Ratio Test (GLRT), associated with the defined hypothesis test the distance between the two segments in Fig. 1 is: ( , ; ) log log ( , ; ) ( , ; )…”
Section: Bayessian Information Criterionmentioning
confidence: 99%
“…First, we select a sufficiently big sliding window, model it and its adjacent sub-segments using GΓD instead of GD, and calculate the distance d R associated with the GLRT using (1). Here, as in [9], we are making the assumption that both noise and speech signals have uncorrelated components in the DCT domain.…”
Section: Distbic Using Generalized Gamma Distributionmentioning
confidence: 99%
“…The detection principles of conventional VADs are usually energy-based approaches, which have been proved computationally efficient to such an extent that they allow real-time signal processing [1]. Moreover, these methods work relatively well in high signal to noise ratios (SNR) and for known stationary noise.…”
In this work, we model speech samples with the generalized Gamma distribution and evaluate the efficiency of such modelling for voice activity detection. Using a computationally inexpensive maximum likelihood approach, we employ the Bayesian Information Criterion for identifying the phoneme boundaries in noisy speech.
“…Infusion of pitch and duration information, use of adaptive thresholds, augmentation of zero crossover rate result in somewhat improved performance [4]. The proposed algorithms, replaces entropy of the speech as the key feature for boundary detection.…”
“…The most commonly used method of endpoint detection is the use of short-time or spectral energy [1,2,3,4]. Typically an adaptive threshold is employed based on the features of the energy profile to differentiate between the speech segments and the background noise.…”
This paper addresses the issue of automatic word/sentence boundary detection in both quiet and noisy environments. We propose to use an entropy based contrast function between the speech segments and the background noise. A simplified data based scheme of computing the entropy of the speech data is presented. The entropy-based contrast exhibits better-behaved characteristics as compared to the energy-based methods. An adaptive threshold is used to determine the candidate speech segments, which are subjected to word/sentence constraints. Experimental results show that this algorithm outperforms energy-based algorithms. The improved detection accuracy of speech segments results in at least 25 % improvement of recognition performance for isolated speech and more than 16% for connected speech. For continuous speech, a preprocessing stage comprising of the proposed speech segment detection makes the overall HMM based scheme more computationally efficient by rejection of silence periods.
In this work, we model speech samples with a two-sided generalized Gamma distribution and evaluate its efficiency for voice activity detection. Using a computationally inexpensive maximum likelihood approach, we employ the Bayesian Information Criterion for identifying the phoneme boundaries in noisy speech.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations –citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.