2019
DOI: 10.1109/jstsp.2019.2909193
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Low Latency Speech Enhancement With RT-GCC-NMF

Abstract: In this paper, we present RT-GCC-NMF: a realtime (RT), two-channel blind speech enhancement algorithm that combines the non-negative matrix factorization (NMF) dictionary learning algorithm with the generalized cross-correlation (GCC) spatial localization method. Using a pre-learned universal NMF dictionary, RT-GCC-NMF operates in a frame-by-frame fashion by associating individual dictionary atoms to target speech or background interference based on their estimated time-delay of arrivals (TDOA). We evaluate RT… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
14
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 15 publications
(14 citation statements)
references
References 45 publications
0
14
0
Order By: Relevance
“…In GCC, estimation of the GCC function Rx1,x2 is the key step to estimate the time delay. This GCC function Rx1,x2 is expressed mathematically in (2) [21, 34]Rx1x2false(τfalse)=normal∞ψfalse(ffalse)Gx1x2false(ffalse)enormali2πfτnormaldf where ψfalse(ffalse) represents a pre‐filter and Gx1,x2false(ffalse) represents the cross spectral density (or) the spectrum of cross correlation. The cross correlation of the two time series signals can be calculated in the time domain or frequency domain.…”
Section: Time Delay Estimation (Tde)mentioning
confidence: 99%
“…In GCC, estimation of the GCC function Rx1,x2 is the key step to estimate the time delay. This GCC function Rx1,x2 is expressed mathematically in (2) [21, 34]Rx1x2false(τfalse)=normal∞ψfalse(ffalse)Gx1x2false(ffalse)enormali2πfτnormaldf where ψfalse(ffalse) represents a pre‐filter and Gx1,x2false(ffalse) represents the cross spectral density (or) the spectrum of cross correlation. The cross correlation of the two time series signals can be calculated in the time domain or frequency domain.…”
Section: Time Delay Estimation (Tde)mentioning
confidence: 99%
“…In this study, we show that our STFT based system can also produce a comparable or better enhancement performance at an algorithmic latency as low as 4 or 2 ms. This is partially achieved by combining STFT-domain, deep learning based speech enhancement with a conventional dual window approach [36], [37], which uses a regularly long window length for STFT and a shorter window length for overlap-add. This approach is illustrated in Fig.…”
Section: Introductionmentioning
confidence: 99%
“…Our study makes three major contributions. First, we adapt a conventional dual window size approach [36], [37] to reduce the algorithmic latency of STFT-domain deep learning based speech enhancement. Second, we utilize the outputs from the first DNN for frequency-domain frame-online beamforming, and the beamforming result is fed to a second DNN for better enhancement.…”
Section: Introductionmentioning
confidence: 99%
“…In this method, a non-negative matrix factorization (NMF) dictionary is combined with generalized cross-correlation (GCC) spatial localization approach. The RT-GCC-NMF operates in a frame-by-frame manner, comparing individual dictionary atom with the desired speech signal or interfering noise based on the time-delay arrivals [30].…”
Section: Introductionmentioning
confidence: 99%