Dual Microphone Speech Enhancement Based on Statistical Modeling of Interchannel Phase Difference

Hwang, Soojoong; Kim, Minseung; Shin, Jong Won

doi:10.1109/taslp.2022.3202121

Cited by 4 publications

(4 citation statements)

References 48 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The proposed dual channel noise PSD estimator based on coherence in (20) shows different characteristics from the single-channel SPP-based noise PSD estimator in (7). Figure 3 shows one example of the noise power spectrum in the beamformer output and the estimates of it for Cafeteria noise at 5 dB SNR.…”

Section: Combining Noise Psd Estimates and Gain Calculationmentioning

confidence: 99%

“…Over the past decades, there has been a growing demand for speech enhancement using microphone arrays in speech processing applications such as automatic speech recognition, mobile communications, and hearing aids [ 1 , 2 , 3 , 4 ]. Multichannel speech enhancement aims to reduce the additive noise and improve the quality of the speech signals obtained by multiple microphones placed in a variety of acoustic environments [ 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 ]. In many multichannel speech enhancement systems, beamforming algorithms, such as the minimum-variance distortionless-response (MVDR) beamformer [ 11 ] and the general transfer function generalized sidelobe canceler (TF-GSC) [ 12 , 13 ], have been employed to extract a desired signal, exploiting spatial information on the location of the sound sources.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Postfilter for Dual Channel Speech Enhancement Using Coherence and Statistical Model-Based Noise Estimation

Cheong,

Kim,

Shin

2024

Sensors

Self Cite

View full text Add to dashboard Cite

A multichannel speech enhancement system usually consists of spatial filters such as adaptive beamformers followed by postfilters, which suppress remaining noise. Accurate estimation of the power spectral density (PSD) of the residual noise is crucial for successful noise reduction in the postfilters. In this paper, we propose a postfilter utilizing proposed a posteriori speech presence probability (SPP) and noise PSD estimators, which are based on both the coherence and the statistical models. We model the coherence-based a posteriori SPP as a simple function of the magnitude of coherence between two microphone signals and combine it with a single-channel SPP based on statistical models. The coherence-based estimator for the PSD of the noise remaining in the beamformer output in the presence of speech is derived using the pseudo-coherence considering the effect of the beamformers, which is used to construct the coherence-based noise PSD estimator. Then, the final noise PSD estimator is obtained by combining the coherence-based and statistical model-based noise PSD estimators with the proposed SPP. The spectral gain function is also modified, incorporating the proposed SPP. Experimental results demonstrate that the proposed method led to more accurate noise PSD estimation and perceptual evaluation of speech quality scores in various diffuse noise environments, and did not degrade the speech quality under the presence of directional interference, although the proposed method utilizes the coherence information.

show abstract

Section: Combining Noise Psd Estimates and Gain Calculationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Postfilter for Dual Channel Speech Enhancement Using Coherence and Statistical Model-Based Noise Estimation

Cheong,

Kim,

Shin

2024

Sensors

Self Cite

View full text Add to dashboard Cite

show abstract

“…Additionally, we carried out an ablation study to analyze how much each module in the proposed system contributed to the performance improvement. We propose the speech PSD estimator, φ tcs,s s in (31), and the RTF estimator, g tdoa,s in (29). The previous approaches were the speech PSD estimator using recursive smoothing, φ ts s in (23), and the ML estimator of the RTF g ml in (25).…”

Section: Ablation Studymentioning

confidence: 99%

“…Speech enhancement is essential to ensure the satisfactory perceptual quality and intelligibility of speech signals in many speech applications, such as hearing aids and speech communication with mobile phones and hands-free systems [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 ]. Currently, devices with multiple microphones are popular, which has enabled multi-microphone speech enhancement, exploiting spatial information as well as spectro-temporal characteristics of the input signals [ 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 ,…”

Section: Introductionmentioning

confidence: 99%

Improved Speech Spatial Covariance Matrix Estimation for Online Multi-Microphone Speech Enhancement

Kim

Song

Shin

2022

Sensors

Self Cite

View full text Add to dashboard Cite

Online multi-microphone speech enhancement aims to extract target speech from multiple noisy inputs by exploiting the spatial information as well as the spectro-temporal characteristics with low latency. Acoustic parameters such as the acoustic transfer function and speech and noise spatial covariance matrices (SCMs) should be estimated in a causal manner to enable the online estimation of the clean speech spectra. In this paper, we propose an improved estimator for the speech SCM, which can be parameterized with the speech power spectral density (PSD) and relative transfer function (RTF). Specifically, we adopt the temporal cepstrum smoothing (TCS) scheme to estimate the speech PSD, which is conventionally estimated with temporal smoothing. Furthermore, we propose a novel RTF estimator based on a time difference of arrival (TDoA) estimate obtained by the cross-correlation method. Furthermore, we propose refining the initial estimate of speech SCM by utilizing the estimates for the clean speech spectrum and clean speech power spectrum. The proposed approach showed superior performance in terms of the perceptual evaluation of speech quality (PESQ) scores, extended short-time objective intelligibility (eSTOI), and scale-invariant signal-to-distortion ratio (SISDR) in our experiments on the CHiME-4 database.

show abstract

Coherent Signal DOA Estimation With Coprime Array: Exploiting Signal Subspace Reconstructing Strategy

Ma,

Li,

Pan

et al. 2024

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Dual Microphone Speech Enhancement Based on Statistical Modeling of Interchannel Phase Difference

Cited by 4 publications

References 48 publications

Postfilter for Dual Channel Speech Enhancement Using Coherence and Statistical Model-Based Noise Estimation

Postfilter for Dual Channel Speech Enhancement Using Coherence and Statistical Model-Based Noise Estimation

Improved Speech Spatial Covariance Matrix Estimation for Online Multi-Microphone Speech Enhancement

Coherent Signal DOA Estimation With Coprime Array: Exploiting Signal Subspace Reconstructing Strategy

Contact Info

Product

Resources

About