2009
DOI: 10.1109/tasl.2008.2010633
|View full text |Cite
|
Sign up to set email alerts
|

A Supervised Learning Approach to Monaural Segregation of Reverberant Speech

Abstract: Abstract-A major source of signal degradation in real environments is room reverberation. Monaural speech segregation in reverberant environments is a particularly challenging problem. Although inverse filtering has been proposed to partially restore the harmonicity of reverberant speech before segregation, this approach is sensitive to specific source/receiver and room configurations. This paper proposes a supervised learning approach to monaural segregation of reverberant voiced speech, which learns to map f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2012
2012
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 60 publications
(4 citation statements)
references
References 40 publications
0
4
0
Order By: Relevance
“…They used a discriminative training criterion for the neural networks to further enhance the monaural source separation performance. In [29], Jin and Wang proposed a supervised learning approach to monaural segregation of reverberant speech using the multiresolution cochleagram (MRCG) features [30].…”
Section: Relation To Prior Workmentioning
confidence: 99%
See 1 more Smart Citation
“…They used a discriminative training criterion for the neural networks to further enhance the monaural source separation performance. In [29], Jin and Wang proposed a supervised learning approach to monaural segregation of reverberant speech using the multiresolution cochleagram (MRCG) features [30].…”
Section: Relation To Prior Workmentioning
confidence: 99%
“…Compared with the monaural segregation of reverberant speech in [29], the stereo speech separation in [31] tends to be more robust due to the use of spatial information. In [7], GMM is used to model the MV and IPD/ILD cues that contain spatial information and the EM algorithm is used to estimate the model parameters and to derive the T-F mask.…”
Section: Relation To Prior Workmentioning
confidence: 99%
“…These algorithms successfully suppress the background noise, which is an inevitable problem in real world scenarios. Although speech enhancement techniques are found to be very helpful in tasks such as speech recognition, recognition performance is severely degraded in noisy conditions [3,4]. It is hence crucial that the speech recognition exhibits a robust performance across ambient noises and benefit many speech technology applications.…”
Section: Introductionmentioning
confidence: 99%
“…It is hence crucial that the speech recognition exhibits a robust performance across ambient noises and benefit many speech technology applications. In [4], the supervised learning approach is proposed to map a set of harmonic features into a pitch based grouping cue for time-frequency (T-F) unit.…”
Section: Introductionmentioning
confidence: 99%