2022
DOI: 10.1109/taslp.2022.3202129
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Channel Talker-Independent Speaker Separation Through Location-Based Training

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(7 citation statements)
references
References 46 publications
0
5
0
Order By: Relevance
“…Other comparison methods, i.e. MISO 1 -BF-MISO 3 [17], Convolutional Prediction [60], MC-CSM with LBT [61] and TFGridNet [30], all perform neural beamforming plus neural post-processing, and achieve much better ASR performance than the timedomain end-to-end networks. This demonstrates the advantage of combining beamforming and deep learning techniques.…”
Section: Results On Sms-wsjmentioning
confidence: 99%
“…Other comparison methods, i.e. MISO 1 -BF-MISO 3 [17], Convolutional Prediction [60], MC-CSM with LBT [61] and TFGridNet [30], all perform neural beamforming plus neural post-processing, and achieve much better ASR performance than the timedomain end-to-end networks. This demonstrates the advantage of combining beamforming and deep learning techniques.…”
Section: Results On Sms-wsjmentioning
confidence: 99%
“…In dynamically changing scenes, models trained with permutation-invariant training in static scenarios could mix up signals from speakers, i.e., the output of the speaker could switch. To avoid such switching, our approach could be combined with location-based training as in [36] or online clustering of frame wise speaker embeddings as proposed in [37].…”
Section: Effect Of Model Size and Groupingmentioning
confidence: 99%
“…Recently LBT was proposed to resolve the permutation ambiguity problem in multi-channel talker-independent speaker separation [11]. LBT leverages distinct spatial locations of multiple speakers in physical space and produces superior separation performance compared to PIT.…”
Section: Location-based Training For Cssmentioning
confidence: 99%
“…In our previous study, we introduced a new training criterion, named location-based training (LBT), to assign DNN outputs according to speaker locations in physical space [11]. We showed that LBT performs better than PIT for fully overlapped utterances in simulated and matched reverberant conditions.…”
Section: Introductionmentioning
confidence: 99%