A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction

Pan, Zexu; Meng, Guang; Li, Haizhou

doi:10.48550/arxiv.2203.16843

Search citation statements

Order By: Relevance

Paper Sections

Select...

G Loss Functions1

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2023

Publication Types

Select...

Other1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

(1 citation statement)

References 32 publications

(51 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We use the standard form of SI-SDR where the target signal is scaled to match the scale of the estimated signal. Also, we scale the magnitude loss by the L 1 norm of the magnitude of the target signal in the STFT domain similar to [24]. These loss functions are defined below…”

Section: G Loss Functionsmentioning

confidence: 99%

Time-domain Transformer-based Audiovisual Speaker Separation

Kalkhorani¹,

Kumar²,

Tan³

et al. 2023

Interspeech 2023

View full text Add to dashboard Cite

We introduce CrossNet, a complex spectral mapping approach to speaker separation and enhancement in reverberant and noisy conditions. The proposed architecture comprises an encoder layer, a global multi-head self-attention module, a crossband module, a narrow-band module, and an output layer. Cross-Net captures global, cross-band, and narrow-band correlations in the time-frequency domain. To address performance degradation in long utterances, we introduce a random chunk positional encoding. Experimental results on multiple datasets demonstrate the effectiveness and robustness of CrossNet, achieving state-ofthe-art performance in tasks including reverberant and noisyreverberant speaker separation. Furthermore, CrossNet exhibits faster and more stable training in comparison to recent baselines. Additionally, CrossNet's high performance extends to multimicrophone conditions, demonstrating its versatility in various acoustic scenarios.

show abstract

Section: G Loss Functionsmentioning

confidence: 99%