2021
DOI: 10.48550/arxiv.2110.10739
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Adapting Speech Separation to Real-World Meetings Using Mixture Invariant Training

Abstract: The recently-proposed mixture invariant training (MixIT) is an unsupervised method for training single-channel sound separation models in the sense that it does not require ground-truth isolated reference sources. In this paper, we investigate using MixIT to adapt a separation model on real far-field overlapping reverberant and noisy speech data from the AMI Corpus. The models are tested on real AMI recordings containing overlapping speech, and are evaluated subjectively by human listeners. To objectively eval… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 17 publications
0
1
0
Order By: Relevance
“…To this end, some latest works start to separate speech from unsupervised or semi-supervised perspectives. In [28]- [30], a mixture invariant training (MixIT) that requires only single-channel real acoustic mixtures was proposed. MixIT uses mixtures of mixtures (MoMs) as input, and sums over estimated sources to match the target mixtures instead of the single-source references.…”
Section: Introductionmentioning
confidence: 99%
“…To this end, some latest works start to separate speech from unsupervised or semi-supervised perspectives. In [28]- [30], a mixture invariant training (MixIT) that requires only single-channel real acoustic mixtures was proposed. MixIT uses mixtures of mixtures (MoMs) as input, and sums over estimated sources to match the target mixtures instead of the single-source references.…”
Section: Introductionmentioning
confidence: 99%