ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9746962
|View full text |Cite
|
Sign up to set email alerts
|

Personalized speech enhancement: new models and Comprehensive evaluation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
55
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 45 publications
(55 citation statements)
references
References 24 publications
0
55
0
Order By: Relevance
“…The model was tested based on speech communication, and the ASR accuracy was not considered. Eskimez et al [4] proposed two PSE models, an evaluation metric called target speaker oversuppression (TSOS), and test sets to cover various scenarios. TSOS measures the degree of removal of the target speaker's speech segments and is critical for PSE since removing the target speech hampers effective conversations and degrades the transcription quality, as reported in [8].…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…The model was tested based on speech communication, and the ASR accuracy was not considered. Eskimez et al [4] proposed two PSE models, an evaluation metric called target speaker oversuppression (TSOS), and test sets to cover various scenarios. TSOS measures the degree of removal of the target speaker's speech segments and is critical for PSE since removing the target speech hampers effective conversations and degrades the transcription quality, as reported in [8].…”
Section: Related Workmentioning
confidence: 99%
“…TSOS measures the degree of removal of the target speaker's speech segments and is critical for PSE since removing the target speech hampers effective conversations and degrades the transcription quality, as reported in [8]. Furthermore, Taherian et al [5] extended [4] to multi-channel scenarios by proposing a model that works with any microphone numbers and array geometries. Although the models of [4] can run on PCs in realtime, the computational cost was still too high for real usage as the audio processing can use only a tiny fraction of the available resources on devices.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations