2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).
DOI: 10.1109/icassp.2003.1198906
|View full text |Cite
|
Sign up to set email alerts
|

A linked-HMM model for robust voicing and speech detection

Abstract: We present a novel method for simultaneous voicing and speech detection based on a linked-HMM architecture, with robust features that are independent of the signal energy. Because this approach models the change in dynamics between speech and non-speech regions, it is robust to low sampling rates, significant levels of additive noise, and large distances from the microphone. We demonstrate the performance of our method in a variety of testing conditions and also compare it to other methods reported in the lite… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
31
0

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 38 publications
(32 citation statements)
references
References 5 publications
(7 reference statements)
1
31
0
Order By: Relevance
“…During a social interaction, speech and silence operate as regulators of a conversation emitting social signals such as consensus, rejection and reveal interlocutors' social behaviour including their emotions [Koudenburg et al 2011]. One of the well-known and widely-used techniques to infer conversation existence was presented by Basu [Basu 2003]. It specified a linked Hidden Markov Model (HMM) with three features: non-initial maximum of the normalized noisy autocorrelation, number of autocorrelation peaks, and normalised spectral entropy.…”
Section: Auditorymentioning
confidence: 99%
“…During a social interaction, speech and silence operate as regulators of a conversation emitting social signals such as consensus, rejection and reveal interlocutors' social behaviour including their emotions [Koudenburg et al 2011]. One of the well-known and widely-used techniques to infer conversation existence was presented by Basu [Basu 2003]. It specified a linked Hidden Markov Model (HMM) with three features: non-initial maximum of the normalized noisy autocorrelation, number of autocorrelation peaks, and normalised spectral entropy.…”
Section: Auditorymentioning
confidence: 99%
“…Because of the close placement of the microphone with respect to the speaker's mouth we can use simple energy threshold to segment the speech from most of the other speech and ambient sounds. It is been shown that one can segment speech using voiced regions (speech regions that have pitch) alone [13]. In voiced regions energy is biased towards low-frequency range and hence we use low-energy threshold (2KHz cut off) instead of total energy.…”
Section: Data Analysis Methodsmentioning
confidence: 99%
“…Purely energy-based approach to speaker segmentation is potentially very susceptible to the noise level of the environment and sound from the user's regular activity. In order to overcome this problem we have incorporated robust speech features (non-initial maximum of the autocorrelation, the number of auto-correlation peaks and the normalized spectral entropy) proposed in [13]. An HMM trained to detect voiced/unvoiced regions using these features is very reliable even in noisy environment with less than 2% error at 10dB SSNR.…”
Section: Data Analysis Methodsmentioning
confidence: 99%
“…Ambient sound and music have less phase deviation than the human voice. Finally, Relative Spectral Entropy (RSE), which is simply the KL (Kullback-Liebker) divergence between the current spectrum and the local mean spectrum ( [Basu, 2003]). It is calculated from sound signals in order to differentiate the human speech form other sounds.…”
Section: Features Used For Detecting Social Interactionsmentioning
confidence: 99%