ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9747855
|View full text |Cite
|
Sign up to set email alerts
|

Adapting Speech Separation to Real-World Meetings using Mixture Invariant Training

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 11 publications
(4 citation statements)
references
References 14 publications
0
4
0
Order By: Relevance
“…(ii) show that exploiting out-of-domain clean speech as well as in-domain real noisy data during the training of the enhancement network yields significant recognition gains for real test samples; (iii) use the MixIT framework for both types of data (instead of switching to supervised training when using outof-domain clean speech as described in [9]) by modifying the remixing matrix A such that the reference speech can be reconstructed by the first output channel alone and the non-speech signal is reconstructed by a sum of channels 2 and 3; (iv) exploit speaker reinforcement post-processing to mask processing artifacts and further improve ASR accuracy.…”
Section: Main Contributionsmentioning
confidence: 99%
See 3 more Smart Citations
“…(ii) show that exploiting out-of-domain clean speech as well as in-domain real noisy data during the training of the enhancement network yields significant recognition gains for real test samples; (iii) use the MixIT framework for both types of data (instead of switching to supervised training when using outof-domain clean speech as described in [9]) by modifying the remixing matrix A such that the reference speech can be reconstructed by the first output channel alone and the non-speech signal is reconstructed by a sum of channels 2 and 3; (iv) exploit speaker reinforcement post-processing to mask processing artifacts and further improve ASR accuracy.…”
Section: Main Contributionsmentioning
confidence: 99%
“…In real situations, where paired noisy and clean signals are not available, we may instead look to use unpaired noisy and clean speech data. Several training strategies have been developed for using such data based on adversarial learning [7,8] and transfer learning [9][10][11]. For the adversarial training, discriminator networks are used to distinguish the enhanced and noised features from the clean and noisy ones, respectively [7].…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations