2022
DOI: 10.36227/techrxiv.20154785
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Wideband Audio Waveform Evaluation Networks: Efficient, Accurate Estimation of Speech Qualities

Abstract: <p>Wideband Audio Waveform Evaluation Networks (WAWEnets) are convolutional neural networks that operate directly on wideband audio waveforms in order to produce evaluations of those waveforms. In the present work these evaluations give qualities of telecommunications speech (e.g., noisiness, intelligibility, overall speech quality). WAWEnets are no-reference networks because they do not require “reference” (original or undistorted) versions of the waveforms they evaluate. Our initial WAWEnet publication… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 0 publications
0
2
0
Order By: Relevance
“…Other NR tools produce estimates of objective values including FR speech quality values [23], [30], [32], [38], [44], [51], [54], [56], [57], FR speech intelligibility values [30], [32], [38], [44], [52], [54], [56], [57], speech transmission index [22], codec bit-rate [46], and detection of specific impairments, artifacts, or noise types [34], [39], [41], [52]. Some of these tools perform a single task and others perform multiple tasks.…”
Section: A Existing Machine Learning Approachesmentioning
confidence: 99%
See 1 more Smart Citation
“…Other NR tools produce estimates of objective values including FR speech quality values [23], [30], [32], [38], [44], [51], [54], [56], [57], FR speech intelligibility values [30], [32], [38], [44], [52], [54], [56], [57], speech transmission index [22], codec bit-rate [46], and detection of specific impairments, artifacts, or noise types [34], [39], [41], [52]. Some of these tools perform a single task and others perform multiple tasks.…”
Section: A Existing Machine Learning Approachesmentioning
confidence: 99%
“…We then trained WAWEnets to estimate these FR values using only the impaired segments. As in [30] and [32], we performed inverse phase augmentation (IPA) to augment all datasets in order to train WAWEnet to learn invariance to waveform phase inversion. This augmentation increased the amount of data available to just over 500 hours of total speech data.…”
Section: Hoursmentioning
confidence: 99%