2022
DOI: 10.1109/jstsp.2022.3181782
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Personalized Speech Enhancement Through Self-Supervised Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 11 publications
(4 citation statements)
references
References 47 publications
1
2
0
Order By: Relevance
“…Third, when we compare the two model sizes, more significant performance improvement is observed when smaller models are in comparison (PLPCNet-S vs. LPCNet-BL-S) than the larger models ((PLPCNet-L vs. LPCNet-BL-L). This trend aligns well with the personalized speech enhancement literature: model personalization benefits compressed model architectures more than the larger ones [15,16,17]. Finally, it is also worth noting that each test sequence is handled by a selected personalized decoder, where the choice is based on the estimated speaker class.…”
Section: Resultssupporting
confidence: 76%
See 1 more Smart Citation
“…Third, when we compare the two model sizes, more significant performance improvement is observed when smaller models are in comparison (PLPCNet-S vs. LPCNet-BL-S) than the larger models ((PLPCNet-L vs. LPCNet-BL-L). This trend aligns well with the personalized speech enhancement literature: model personalization benefits compressed model architectures more than the larger ones [15,16,17]. Finally, it is also worth noting that each test sequence is handled by a selected personalized decoder, where the choice is based on the estimated speaker class.…”
Section: Resultssupporting
confidence: 76%
“…Personalization has shown promising results in model compression tasks for speech enhancement [15,16,17,18]. A personalized model adapts to the target speaker group's speech trait, narrowing the training task down to a smaller subtask, i.e., defined by the smaller speaker group than the entire speakers in the corpus.…”
Section: Introductionmentioning
confidence: 99%
“…Moreover, Tao et al [42] presented a method called Neighbor2Neighbor to train an effective image-denoising model without only noisy images. Aswin et al [43] proposed self-supervised learning methods as a solution to both zero-and few-shot personalization tasks. Sonining et al [37] investigated the performance of such a time-domain network (Conv-TasNet) for speech denoising in a real-time setting, comparing various parameters settings.…”
Section: Related Workmentioning
confidence: 99%
“…Self-supervised learning can be used to train on individual data to build patient-specific models. For data that is already split at subject level, we can apply self-supervised learning directly [459]. When the data is not split, we can apply clustering to find subgroups in the data to apply self-supervised learning on [460,461].…”
Section: Personalized Modelsmentioning
confidence: 99%