2022
DOI: 10.1016/j.apacoust.2021.108539
|View full text |Cite
|
Sign up to set email alerts
|

Enhancing the correlation between the quality and intelligibility objective metrics with the subjective scores by shallow feed forward neural network for time–frequency masking speech separation algorithms

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 31 publications
0
4
0
Order By: Relevance
“…SRMR is a non-intrusive metric, so no reference signal is required for its estimation [ 80 ], whereas the rest of the metrics are intrusive metrics and thus require the clean speech sample as a reference for the performance evaluation [ 80 ]. Among these metrics, PESQ and STOI are known to correlate well with the human perception of quality and intelligibility [ 81 ]. SRMR metric is commonly used to evaluate speech dereverberation algorithms and reflect the quality and intelligibility of the reverberant speech [ 57 ].…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…SRMR is a non-intrusive metric, so no reference signal is required for its estimation [ 80 ], whereas the rest of the metrics are intrusive metrics and thus require the clean speech sample as a reference for the performance evaluation [ 80 ]. Among these metrics, PESQ and STOI are known to correlate well with the human perception of quality and intelligibility [ 81 ]. SRMR metric is commonly used to evaluate speech dereverberation algorithms and reflect the quality and intelligibility of the reverberant speech [ 57 ].…”
Section: Methodsmentioning
confidence: 99%
“…SRMR metric is commonly used to evaluate speech dereverberation algorithms and reflect the quality and intelligibility of the reverberant speech [ 57 ]. SDR shows the estimated speech quality by comparing the estimated signal energy with all kinds of distortions [ 81 ]. CD measures the similarity between short-time spectra of the estimated and clean speech [ 81 ].…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Also, this model is restricted for anechoic conditions. This problem was resolved in [113] by using SONET with Expectation Maximization (EM) (a machine learning algorithm), which outperforms its constituent systems, both under anechoic and reverberant conditions, as indicated by the results of subjective listening tests in [114]. The most interesting fact about the SSS model in [113] is that it uses the anechoic pre-trained model ‗SONET', without any need for retraining, to tackle the echoes.…”
Section: Viiiiiii Speech Source Separation (Sss)mentioning
confidence: 99%