Cross-Channel Spectral Subtraction for meeting speech recognition

Nasu, Yu; Sairyo, Koichi; Furui, Sadaoki

doi:10.1109/icassp.2011.5947432

Cited by 6 publications

(13 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…where c is the adaptation speed factor, which is chosen between zero and one. Here, instead of updating the noise spectra with current noisy spectra | Y(m, k) | − | D(m, k) | as in [26], we updated them with the noise estimate from the Wiener filter (see (16)). Therefore, we could control the parts of the slow changes of noise in | D(m, k − 1) |, which was determined from the average estimate of clean speech from preceding frames, and faster changes in | D n (m, k) |, which was computed by the Wiener filter.…”

Section: The Proposed Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Speech Enhancement for Secure Communication Using Coupled Spectral Subtraction and Wiener Filter

et al. 2019

View full text Add to dashboard Cite

The encryption process for secure voice communication may degrade the speech quality when it is applied to the speech signals before encoding them through a conventional communication system such as GSM or radio trunking. This is because the encryption process usually includes a randomization of the speech signals, and hence, when the speech is decrypted, it may perceptibly be distorted, so satisfactory speech quality for communication is not achieved. To deal with this, we could apply a speech enhancement method to improve the quality of decrypted speech. However, many speech enhancement methods work by assuming noise is present all the time, so the voice activity detector (VAD) is applied to detect the non-speech period to update the noise estimate. Unfortunately, this assumption is not valid for the decrypted speech. Since the encryption process is applied only when speech is detected, distortions from the secure communication system are characteristically different. They exist when speech is present. Therefore, a noise estimator that is able to update noise even when speech is present is needed. However, most noise estimator techniques only adapt to slow changes of noise to avoid over-estimation of noise, making them unsuitable for this task. In this paper, we propose a speech enhancement technique to improve the quality of speech from secure communication. We use a combination of the Wiener filter and spectral subtraction for the noise estimator, so our method is better at tracking fast changes of noise without over-estimating them. Our experimental results on various communication channels indicate that our method is better than other popular noise estimators and speech enhancement methods.

show abstract

Section: The Proposed Methodsmentioning

confidence: 99%

“…Noise is usually estimated using early frames of speech assuming speech is not present during those periods. Later, spectral subtraction is also used for other purposes such as for source separations [16] and dereverberation [17].…”

Section: Introductionmentioning

confidence: 99%

Speech Enhancement for Secure Communication Using Coupled Spectral Subtraction and Wiener Filter

et al. 2019

View full text Add to dashboard Cite

show abstract

“…Different solutions to the task of crosstalk detection have been proposed for different micro-phone position schemes [4]. Such crosstalk detection systems use different cross-channel features that use the signals of the microphones of the target speaker and non-target speakers [1], [5]- [9]. In this case, each non-target speaker has to have their own micorphone, whose signal is compared to the operator signal, which makes it difficult to use such systems in unprepared areas.…”

Section: Introductionmentioning

confidence: 99%

“…A different solution to the problem of acoustic overlapped speech is presented by the methods of multichannel spectral subtraction [1], crosstalk cancellation, speech separation [2], beamforming [3]. However, signal filtering leads to speech distortions, which can decrease speech recognition efficiency (it is necessary to retrain the recognition system).…”

Section: Introductionmentioning

confidence: 99%

Speech and Crosstalk Detection for Robust Speech Recognition Using a Dual Microphone System

Stolbov¹,

Tatarnikova²

2013

Speech and Computer

View full text Add to dashboard Cite

Abstract. This paper proposes a practical speech detection technique for robust automatic speech recognition, suitable for use under various interference conditions. This technique consists of a dual microphone system and an algorithm for processing their signals. The microphone module is placed in the workplace of the target speaker. The module consists of two symmetrical supercardioid microphones directed in opposite directions. The algorithm of target speaker detection is proposed for this scheme. This algorithm makes it possible to implement spatial filtering of speakers. Experiments with real recordings demonstrate a significant reduction of speech recognition errors for the target speaker due to suppression of acoustic crosstalk. The main advantage of the proposed technique is simplicity of its use in a wide range of practical situations.

show abstract

“…In recent years, meeting speech recognition (Maganti et al, 2007;Nasu et al, 2011) and meeting speaker diarization (Boakye et al, 2008;Ben-Harush et al, 2009;Stolcke et al, 2010;Sun et al, 2010;Valente et al, 2010;Vijayasenan et al, 2010;Boakye et al, 2011;Stolcke, 2011;Valente et al, 2011;Yella et al, 2011;Vijayasenan et al, 2012;Zwyssig et al, 2012) have been effectively utilized to transcribe and browse meeting procedures. However, their performance is usually low at the overlapped speech segments where more than one speaker is speaking.…”

Section: Introductionmentioning

confidence: 99%

Detection of overlapped speech using lapel microphones in meeting

Yokoyama

Nasu

Iwano

et al. 2013

Speech Communication

Self Cite

View full text Add to dashboard Cite

We propose an overlapped speech detection method for speech recognition and speaker diarization of meetings, where each speaker wears a lapel microphone. Two novel features are utilized as inputs for a GMM-based detector. One is speech power after cross-channel spectral subtraction which reduces the power from the other speakers. The other is an amplitude spectral cosine correlation coefficient which effectively extracts the correlation of spectral components in a rather quiet condition. We evaluated our method using a meeting speech corpus of four speakers. The accuracy of our proposed method, 75.7%, was significantly better than that of the conventional method, 66.8%, which uses raw speech power and power spectral Pearson's correlation coefficient.

show abstract

Cross-Channel Spectral Subtraction for meeting speech recognition

Cited by 6 publications

References 6 publications

Speech Enhancement for Secure Communication Using Coupled Spectral Subtraction and Wiener Filter

Speech Enhancement for Secure Communication Using Coupled Spectral Subtraction and Wiener Filter

Speech and Crosstalk Detection for Robust Speech Recognition Using a Dual Microphone System

Detection of overlapped speech using lapel microphones in meeting

Contact Info

Product

Resources

About