A comparison of front-end compensation strategies for robust LVCSR under room reverberation and increased vocal effort

Sadjadi, Seyed Omid; Bořil, Hynek; Hansen, John H. L.

doi:10.1109/icassp.2012.6288968

Cited by 8 publications

(3 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Mean Hilbert envelope coefficients (MHECs) were recently proposed for noise robust speech, speaker, and language recognition [61,66]. It uses the output of each filter in the filterbank.…”

Section: Mean Hilbert Envelope Coefficients (Mhecs)mentioning

confidence: 99%

Spoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noise

Hanili

Kinnunen

Sahidullah

et al. 2016

Speech Communication

View full text Add to dashboard Cite

Automatic speaker verification (ASV) technology is recently finding its way to end-user applications for secure access to personal data, smart services or physical facilities. Similar to other biometric technologies, speaker verification is vulnerable to spoofing attacks where an attacker masquerades as a particular target speaker via impersonation, replay, text-to-speech (TTS) or voice conversion (VC) techniques to gain illegitimate access to the system. We focus on TTS and VC that represent the most flexible, high-end spoofing attacks. Most of the prior studies on synthesized or converted speech detection report their findings using high-quality clean recordings. Meanwhile, the performance of spoofing detectors in the presence of additive noise, an important consideration in practical ASV implementations, remains largely unknown. To this end, our study provides a comparative analysis of existing state-of-the-art, off-the-shelf synthetic speech detectors under additive noise contamination with a special focus on front-end processing that has been found critical. Our comparison includes eight acoustic feature sets, five related to spectral magnitude and three to spectral phase information. All the methods contain a number of internal control parameters. Except for feature post-processing steps (deltas and cepstral mean normalization) that we optimized for each method, we fix the internal control parameters to their default values based on literature, and compare all the variants using the exact same dimensionality and back-end system. In addition to the eight feature sets, we consider two alternative classifier back-ends: Gaussian mixture model (GMM) and i-vector, the latter with both cosine scoring and probabilistic linear discriminant analysis (PLDA) scoring. Our extensive analysis on the recent ASVspoof 2015 challenge provides new insights to the robustness of the spoofing detectors. Firstly, unlike in most other speech processing tasks, all the compared spoofing detectors break down even at relatively high signal-to-noise ratios (SNRs) and fail to generalize to noisy conditions even if performing excellently on clean data. This indicates both difficulty of the task, as well as potential to over-fit the methods easily. Secondly, speech enhancement pre-processing is not found helpful. Thirdly, GMM back-end generally outperforms the more involved i-vector back-end. Fourthly, concerning the compared features, the Mel-frequency cepstral coefficient (MFCC) and subband spectral centroid magnitude coefficient (SCMC) features perform the best on average though the winner method depends on SNR and noise type. Finally, a study with two score fusion strategies shows that combining different feature based systems improves recognition accuracy for known and unknown attacks in both clean and noisy conditions. In particular, simple score averaging fusion, as opposed to weighted fusion with logistic loss weight optimization, was found to work better, on average. For clean speech, it provides 88% and 28% relative improvements ...

show abstract

“…Mean Hilbert envelope coefficients (MHECs) were recently proposed for noise robust speech, speaker, and language recognition [61,66]. It uses the output of each filter in the filterbank.…”

Section: Mean Hilbert Envelope Coefficients (Mhecs)mentioning

confidence: 99%

Spoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noise

Hanili

Kinnunen

Sahidullah

et al. 2016

Speech Communication

View full text Add to dashboard Cite

show abstract

“…In recent years, researchers have been putting a great deal of effort into the development of speech processing algorithms that would maintain good performance in real world conditions. Besides speaker/channel variability and room reverberation [19,21], environmental noise represents one of the most disruptive and hard to deal with factors [22]. Successful modeling and suppression of noise effects in speech engines requires availability of noisy speech data.…”

Section: Communication In Noisementioning

confidence: 99%

“…Environmental -background noise [16] (stationary, impulsive, time-varying, etc. ), room acoustics [17], reverberation [18,19], distant microphone. Data quality -duration, sampling rate, recording quality, audio codec/compression [20].…”

Section: Introductionmentioning

confidence: 99%

Robustness in Speech, Speaker, and Language Recognition: “You’ve Got to Know Your Limitations”

Hansen¹,

Bořil²

2016

Interspeech 2016

Self Cite

View full text Add to dashboard Cite

In the field of speech, speaker and language recognition, significant gains have and are being made with new machine learning strategies along with the availability of new and emerging speech corpora. However, many of the core scientific principles required for effective speech processing research appear to be drifting to the sidelines with the assumptions that access to larger amounts of data can address a growing range of issues relating to new speech/speaker/language recognition scenarios. This study focuses on exploring several challenging domains in formulating effective solutions in realistic speech data, and in particular the notion of using naturalistic data to better reflect the potential effectiveness of new algorithms. Our main focus is on mismatch/speech variability issues due to (i) differences in noisy speech with and without Lombard effect and a communication factor, (ii) realistic field data in noisy/increased cognitive load conditions, and (iii) dialect identification using found data. Finally, we study speaker-noise and speaker-speaker interactions in a newly established, fully naturalistic Prof-Life-Log corpus. The specific outcomes from this study include an analysis of the strengths and weaknesses of simulated vs. actual speech data collection for research.

show abstract

Robust acoustic bird recognition for habitat monitoring with wireless sensor networks

Boulmaiz

Messadeg

Doghmane

et al. 2016

Int J Speech Technol

View full text Add to dashboard Cite

A comparison of front-end compensation strategies for robust LVCSR under room reverberation and increased vocal effort

Cited by 8 publications

References 20 publications

Spoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noise

Spoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noise

Robustness in Speech, Speaker, and Language Recognition: “You’ve Got to Know Your Limitations”

Robust acoustic bird recognition for habitat monitoring with wireless sensor networks

Contact Info

Product

Resources

About