The Delta-Phase Spectrum With Application to Voice Activity Detection and Speaker Recognition

McCowan, Iain; McLaren, Mitchell; Vogt, Robbie; Sridharan, Sridha

doi:10.1109/tasl.2011.2109379

Cited by 48 publications

(17 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In few findings, we see that vector quantization and VAD have been used along with MFCC to increase the efficiency of speaker recognition [2]. Speaker recognition rates are also controlled with the help of Mel-frequency delta phase (MFDP) along with MFCC and it is found that error probability is less in MFCC, but when both MFCC and MFDP are used together, it proves to be more efficient [3]. A review of the use of phase information in speech processing, however, indicates that broadly effective phase domain features remain difficult to extract [4].…”

Section: Introductionmentioning

confidence: 99%

A Review on Voice Activity Detection and Mel-Frequency Cepstral Coefficients for Speaker Recognition (Trend Analysis)

Mahalakshmi

2016

Asian J Pharm Clin Res

View full text Add to dashboard Cite

Objective: The objective of this review article is to give a complete review of various techniques that are used for speech recognition purposes over two decades.Methods: VAD-Voice Activity Detection, SAD-Speech Activity Detection techniques are discussed that are used to distinguish voiced from unvoiced signals and MFCC-Mel Frequency Cepstral Coefficient technique is discussed which detects specific features. Results:The review results show that research in MFCC has been dominant in signal processing in comparison to VAD and other existing techniques. Conclusion:A comparison of different speaker recognition techniques that were used previously were discussed and those in current research were also discussed and a clear idea of the better technique was identified through the review of multiple literature for over two decades.

show abstract

Section: Introductionmentioning

confidence: 99%

A Review on Voice Activity Detection and Mel-Frequency Cepstral Coefficients for Speaker Recognition (Trend Analysis)

Mahalakshmi

2016

Asian J Pharm Clin Res

View full text Add to dashboard Cite

show abstract

“…Nowadays, many speech-related applications are developed to facilitate our daily lives. Voice activity detection (VAD), which detects speech segments in an audio stream, is often included in the front-end of speech-related systems, such as in telecommunication systems [1], [2], robust automatic speech recognition system [3] and speaker recognition systems [4], [5]. Therefore, a robust VAD for any noise condition is greatly needed.…”

Section: Introductionmentioning

confidence: 99%

Robust Voice Activity Detection Algorithm Based on Feature of Frequency Modulation of Harmonics and Its DSP Implementation

Hsu

Cheong

Chi

et al. 2015

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARY This paper proposes a voice activity detection (VAD) algorithm based on an energy related feature of the frequency modulation of harmonics. A multi-resolution spectro-temporal analysis framework, which was developed to extract texture features of the audio signal from its Fourier spectrogram, is used to extract frequency modulation features of the speech signal. The proposed algorithm labels the voice active segments of the speech signal by comparing the energy related feature of the frequency modulation of harmonics with a threshold. Then, the proposed VAD is implemented on one of Texas Instruments (TI) digital signal processor (DSP) platforms for real-time operation. Simulations conducted on the DSP platform demonstrate the proposed VAD performs significantly better than three standard VADs, ITU-T G.729B, ETSI AMR1 and AMR2, in non-stationary noise in terms of the receiver operating characteristic (ROC) curves and the recognition rates from a practical distributed speech recognition (DSR) system.

show abstract

“…According to [10], energy VAD with spectral subtraction enhancement can outperform more advanced statistical model VAD [3]. Alternative ways to tackle noise include alternative features such as periodicity [11] or phase [12].…”

Section: Introductionmentioning

confidence: 99%

“…Beyond the simple energy VAD, at the other extreme are methods that adopt an off-the-shelf phone recognizer or trainable models for VAD [13,14,15,16,12]. For instance, phone posterior probabilities can be merged and combined with energy measures [13].…”

Section: Introductionmentioning

confidence: 99%

A practical, self-adaptive voice activity detector for speaker verification with noisy telephone and microphone data

Kinnunen

Rajan

2013

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

A voice activity detector (VAD) plays a vital role in robust speaker verification, where energy VAD is most commonly used. Energy VAD works well in noise-free conditions but deteriorates in noisy conditions. One way to tackle this is to introduce speech enhancement preprocessing. We study an alternative, likelihood ratio based VAD that trains speech and nonspeech models on an utterance-byutterance basis from mel-frequency cepstral coefficients (MFCCs). The training labels are obtained from enhanced energy VAD. As the speech and nonspeech models are re-trained for each utterance, minimum assumptions of the background noise are made. According to both VAD error analysis and speaker verification results utilizing state-of-the-art i-vector system, the proposed method outperforms energy VAD variants by a wide margin. We provide open-source implementation of the method.

show abstract

The Delta-Phase Spectrum With Application to Voice Activity Detection and Speaker Recognition

Cited by 48 publications

References 37 publications

A Review on Voice Activity Detection and Mel-Frequency Cepstral Coefficients for Speaker Recognition (Trend Analysis)

A Review on Voice Activity Detection and Mel-Frequency Cepstral Coefficients for Speaker Recognition (Trend Analysis)

Robust Voice Activity Detection Algorithm Based on Feature of Frequency Modulation of Harmonics and Its DSP Implementation

A practical, self-adaptive voice activity detector for speaker verification with noisy telephone and microphone data

Contact Info

Product

Resources

About