2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2011
DOI: 10.1109/icassp.2011.5947432
|View full text |Cite
|
Sign up to set email alerts
|

Cross-Channel Spectral Subtraction for meeting speech recognition

Abstract: We propose Cross-Channel Spectral Subtraction (CCSS), a source separation method for recognizing meeting speech where one microphone is prepared for each speaker. The method quickly adapts to changes in transfer functions and uses spectral subtraction to suppress the speech of other speakers. Compared with conventional source separation methods based on independent component analysis (ICA) or that use binary masks, it requires less computational costs and the resulting speech signals have less distortion. In a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
13
0

Year Published

2012
2012
2021
2021

Publication Types

Select...
3
2
1

Relationship

3
3

Authors

Journals

citations
Cited by 6 publications
(13 citation statements)
references
References 6 publications
0
13
0
Order By: Relevance
“…where c is the adaptation speed factor, which is chosen between zero and one. Here, instead of updating the noise spectra with current noisy spectra | Y(m, k) | − | D(m, k) | as in [26], we updated them with the noise estimate from the Wiener filter (see (16)). Therefore, we could control the parts of the slow changes of noise in | D(m, k − 1) |, which was determined from the average estimate of clean speech from preceding frames, and faster changes in | D n (m, k) |, which was computed by the Wiener filter.…”
Section: The Proposed Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…where c is the adaptation speed factor, which is chosen between zero and one. Here, instead of updating the noise spectra with current noisy spectra | Y(m, k) | − | D(m, k) | as in [26], we updated them with the noise estimate from the Wiener filter (see (16)). Therefore, we could control the parts of the slow changes of noise in | D(m, k − 1) |, which was determined from the average estimate of clean speech from preceding frames, and faster changes in | D n (m, k) |, which was computed by the Wiener filter.…”
Section: The Proposed Methodsmentioning
confidence: 99%
“…Noise is usually estimated using early frames of speech assuming speech is not present during those periods. Later, spectral subtraction is also used for other purposes such as for source separations [16] and dereverberation [17].…”
Section: Introductionmentioning
confidence: 99%
“…Different solutions to the task of crosstalk detection have been proposed for different micro-phone position schemes [4]. Such crosstalk detection systems use different cross-channel features that use the signals of the microphones of the target speaker and non-target speakers [1], [5]- [9]. In this case, each non-target speaker has to have their own micorphone, whose signal is compared to the operator signal, which makes it difficult to use such systems in unprepared areas.…”
Section: Introductionmentioning
confidence: 99%
“…A different solution to the problem of acoustic overlapped speech is presented by the methods of multichannel spectral subtraction [1], crosstalk cancellation, speech separation [2], beamforming [3]. However, signal filtering leads to speech distortions, which can decrease speech recognition efficiency (it is necessary to retrain the recognition system).…”
Section: Introductionmentioning
confidence: 99%
“…In recent years, meeting speech recognition (Maganti et al, 2007;Nasu et al, 2011) and meeting speaker diarization (Boakye et al, 2008;Ben-Harush et al, 2009;Stolcke et al, 2010;Sun et al, 2010;Valente et al, 2010;Vijayasenan et al, 2010;Boakye et al, 2011;Stolcke, 2011;Valente et al, 2011;Yella et al, 2011;Vijayasenan et al, 2012;Zwyssig et al, 2012) have been effectively utilized to transcribe and browse meeting procedures. However, their performance is usually low at the overlapped speech segments where more than one speaker is speaking.…”
Section: Introductionmentioning
confidence: 99%