Single-Channel Online Enhancement of Speech Corrupted by Reverberation and Noise

IEEE/ACM Trans. Audio Speech Lang. Process.

2018

Self Cite

Abstract-We present a speech enhancement algorithm that performs modulation-domain Kalman filtering to track the speech phase using circular statistics, along with the log-spectra of speech and noise. In the proposed algorithm, the speech phase posterior is used to create an enhanced speech phase spectrum for the signal reconstruction of speech. The Kalman filter prediction step separately models the temporal inter-frame correlation of the speech and noise spectral log-amplitudes and of the speech phase, while the Kalman filter update step models their nonlinear relations under the assumption that speech and noise add in the complex short-time Fourier transform domain. The phasesensitive enhancement algorithm is evaluated with speech quality and intelligibility metrics, using a variety of noise types over a range of SNRs. Instrumental measures predict that tracking the speech log-spectrum and phase with modulation-domain Kalman filtering leads to consistent improvements in speech quality, over both conventional enhancement algorithms and other algorithms that perform modulation-domain Kalman filtering.

Section: Additional Literature Reviewmentioning

confidence: 99%

“…Estimating frequency-dependent reverberation parameters is beneficial. Reverberation is frequency dependent and obtaining a T 60 estimate for each individual frequency bin, or for every Melspaced frequency band as in [34] [35], is advantageous.…”

Section: Additional Literature Reviewmentioning

confidence: 99%

Phase-Aware Single-Channel Speech Enhancement With Modulation-Domain Kalman Filtering

IEEE/ACM Trans. Audio Speech Lang. Process.

2018

Self Cite

“…2.1, we do KF noise tracking in the log-power spectral domain based on AR(r) modeling and on the estimated SNR in the modulation frame [3]. After the noise KF prediction step, we decorrelate the joint KF state and, then, we multiply the noise log-power Gaussian with the Gaussian that is obtained from external noise estimation and log-normal noise power modeling [20] [21].…”

Section: Kf Noise Tracking and The Joint Speech-noise Kf Statementioning

confidence: 99%

Modulation-domain speech enhancement using a Kalman filter with a Bayesian update of speech and noise in the log-spectral domain

2017 Hands-Free Speech Communications and Microphone Arrays (HSCMA)

2017

Self Cite

We present a Bayesian estimator that performs log-spectrum estimation of both speech and noise, and is used as a Bayesian Kalman filter update step for single-channel speech enhancement in the modulation domain. We use Kalman filtering in the log-power spectral domain rather than in the amplitude or power spectral domains. In the Bayesian Kalman filter update step, we define the posterior distribution of the clean speech and noise log-power spectra as a twodimensional multivariate Gaussian distribution. We utilize a Kalman filter observation constraint surface in the three-dimensional space, where the third dimension is the phase factor. We evaluate the results of the phase-sensitive log-spectrum Kalman filter by comparing them with the results obtained by traditional noise suppression techniques and by an alternative Kalman filtering technique that assumes additivity of speech and noise in the power spectral domain.

“…After the noise KF prediction, we decorrelate the noise KF state and, then, we multiply the noise log-power Gaussian with the Gaussian that is obtained from external noise estimation and log-normal noise power modeling [25] [26]. As in (1) that describes the speech KF prediction, for the noise, (n), KF prediction:…”

Section: Noise Tracking and The Speech-noise Kfmentioning

confidence: 99%

Speech enhancement using modulation-domain Kalman filtering with active speech level normalized log-spectrum global priors

2017 25th European Signal Processing Conference (EUSIPCO)

2017

Self Cite

Abstract-We describe a single-channel speech enhancement algorithm that is based on modulation-domain Kalman filtering that tracks the inter-frame time evolution of the speech logpower spectrum in combination with the long-term average speech log-spectrum. We use offline-trained log-power spectrum global priors incorporated in the Kalman filter prediction and update steps for enhancing noise suppression. In particular, we train and utilize Gaussian mixture model priors for speech in the log-spectral domain that are normalized with respect to the active speech level. The Kalman filter update step uses the log-power spectrum global priors together with the local priors obtained from the Kalman filter prediction step. The logspectrum Kalman filtering algorithm, which uses the theoretical phase factor distribution and improves the modeling of the modulation features, is evaluated in terms of speech quality. Different algorithm configurations, dependent on whether global priors and/or Kalman filter noise tracking are used, are compared in various noise types.