2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2019
DOI: 10.1109/waspaa.2019.8937192
|View full text |Cite
|
Sign up to set email alerts
|

A Classification-Aided Framework for Non-Intrusive Speech Quality Assessment

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 14 publications
(9 citation statements)
references
References 28 publications
0
9
0
Order By: Relevance
“…It was also noticed in [22] that by minimizing the MSE the regression task may lead to prediction outliers and cause the model to overfit. By doing a very coarse quantization on real value PESQ labels, a classification loss was added to the MSE based regression loss of PESQ for punishing samples with large estimation errors.…”
Section: Recap Of Related Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…It was also noticed in [22] that by minimizing the MSE the regression task may lead to prediction outliers and cause the model to overfit. By doing a very coarse quantization on real value PESQ labels, a classification loss was added to the MSE based regression loss of PESQ for punishing samples with large estimation errors.…”
Section: Recap Of Related Methodsmentioning
confidence: 99%
“…Meanwhile, such metric models are not differentiable, which limits their capability to work with other systems in a joint optimization manner. Inspired by the great success of deep learning, the deep neural networks have been developed to address the non-intrusive speech evaluation problem recently [5,[15][16][17][18][19][20][21][22][23]. In [17], an end-to-end and non-intrusive speech quality evaluation model, termed Quality-Net, was proposed based on bidirectional long short-term memory (BLSTM).…”
Section: Introductionmentioning
confidence: 99%
“…The kernel sizes of the maxpooling layers are set to 2 × 2. Note that this architecture is based on [16], since it performed well for a similar but different speech assessment task. We did, however, make modifications as discussed next.…”
Section: Feature Extraction and T60 Estimationmentioning
confidence: 99%
“…Since other speech-related tasks have shown that the MSE is sub-optimal, alternative loss functions for reverberation time estimation should be explored. Likewise, work in ASR [15] and speech assessment [16] have shown that treating speech-tasks as classification, rather than regression problems is beneficial. In this paper, we propose a composite classification and regression based loss function for estimating reverberation time for a variety of seen and unseen reverberant conditions.…”
Section: Introductionmentioning
confidence: 99%
“…In most cases, the primary approach is to train a model to predict objective (e.g., PESQ) and/or subjective (e.g., MOS) scores. However, generalization to unseen perturbations and tasks remains a concern [21], and most methods have not found wide-spread uses for SQA. Given that matching human subjective ratings is the ultimate motivation, some recent works try to train neural networks directly on MOS scores [20,14,1].…”
Section: Introductionmentioning
confidence: 99%