A Study on the Relationship between the Intelligibility and Quality of Algorithmically-Modified Speech for Normal Hearing Listeners

Tang, Yan; Arnold, Christopher; Cox, Trevor J.

doi:10.3390/ohbm1010005

Cited by 16 publications

(23 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although statistically significant, the improvement in the speech intelligibility metric (i.e., STOI) was not as prominent as in the two speech quality metrics (i.e., SDR and PESQ). This is probably because of ceiling effects; the SNR tested was high overall (starting from 1 dB SNR) and speech intelligibility was not a significant issue in these models of normal hearing (Tang et al, 2017). For the SepFormer model, the acoustic evaluation scores for the unprocessed noisy mixtures (dashed lines) remained the same since the test materials did not change, as shown in Figure 2d–2f.…”

Section: Resultsmentioning

confidence: 98%

Deep Learning Restores Speech Intelligibility in Multi-Talker Interference for Cochlear Implant Users

Borjigin

Kokkinakis²,

Bharadwaj

et al. 2022

Preprint

View full text Add to dashboard Cite

Despite excellent performance in quiet, cochlear implants (CIs) only partially restore normal levels of intelligibility in noisy settings. Recent developments in machine learning have resulted in deep neural network (DNN) models that achieve noteworthy performance in speech enhancement and separation tasks. However, there are no commercially available CI audio processors that utilize DNN models for noise reduction. We implemented two DNN models intended for applications in CIs: (1) a recurrent neural network (RNN), which is a lightweight template model, and (2) SepFormer, which is the current top-performing speech separation model in the literature. The models were trained with a custom training dataset (30 hours) that included four configurations: speech in non-speech noise and speech in 1-talker, 2-talker, and 4-talker speech babble backgrounds. The enhancement of the target speech (or the suppression of the noise) by the models was evaluated by commonly used acoustic evaluation metrics of quality and intelligibility, including (1) signal-to-distortion ratio, (2) ``perceptual'' evaluation of speech quality, and (3) short-time objective intelligibility. Both DNN models yielded significant improvements in all acoustic metrics tested. The two DNN models were also evaluated with thirteen CI users using two types of background noise: (1) CCITT noise (speech-shaped stationary noise) and (2) 2-talker babble. Significant improvements in speech intelligibility were observed when the noisy speech was processed by the models, compared to the unprocessed conditions. This work serves as a proof of concept for the application of DNN technology in CIs for improved listening experience and speech comprehension in noisy environments.

show abstract

Section: Resultsmentioning

confidence: 98%

Deep Learning Restores Speech Intelligibility in Multi-Talker Interference for Cochlear Implant Users

Borjigin

Kokkinakis²,

Bharadwaj

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…It was shown also in [18] that when listening in noise, modification performance on improving intelligibility is more important than its potential negative impact on speech quality. However, when listening in quiet or at SNRs in which intelligibility is no longer an issue to listeners, the impact on speech quality due to modification becomes a concern.…”

Section: Discussionmentioning

confidence: 99%

Comparison of Speech Quality and Intelligibility Assessments in University Classrooms

Prodeus

Kukharicheva

Didkovska

2021

Int. J. Archit. Eng. Technol.

View full text Add to dashboard Cite

Estimates of speech quality and intelligibility for three university classrooms of small, medium and large sizes are presented. The quality and intelligibility of speech were assessed by objective methods using binaural room impulse responses, measured at 5-6 points of the premises. The measures of speech quality were log-spectral distortion (LSD), bark spectral distortion (BSD) and perceptual evaluation of speech quality (PESQ), and the objective measure of speech intelligibility was the speech transmission index (STI). Among the quality measures considered, only BSD is shown to be highly correlated with STI measures for all three classrooms. In this case, correlation coefficient R varies from minus 0.6 for a small room to minus 0.98 for a large room. The close relationship between PESQ and STI is observed only in the case of a large classroom (R = 0.96-0.99), and the LSD measure was found to be uncorrelated with STI for premises of all sizes. The obtained results can serve as a justification for the use of BSD instead of STI, and vice versa, in the acoustic examination of classrooms of different sizes.

show abstract

“…Another example is the preliminary high-frequency filtering of signals, which allows increasing the efficiency of automatic speech recognition systems [14]. To increase the intelligibility of speech masked by intense noise, it is possible to use algorithms for intentional distortion of speech signals in the time or spectral domain, or in both domains at once [15]. Decreased intelligibility and quality of speech when using speech enhancement algorithms is a known fact [16].…”

Section: Problem Statementmentioning

confidence: 99%

Impact of University Classroom Size on the Relationship between Speech Quality and Intelligibility

Prodeus¹,

Didkovska²,

Kukharicheva³

2022

IJC

View full text Add to dashboard Cite

In this paper, five objective measures of the quality of speech signals distorted by reverberation are compared with the Speech Transmission Index (STI). The main aim of the comparison is to further test and explain the reasons for the previously discovered phenomenon of an increase in the speech quality and intelligibility with increasing room size. The comparison is performed for three university classrooms of small, medium and large sizes. The correlation coefficients between the quality and intelligibility estimates of speech obtained for 5-6 points of each room are estimated. Speech signal quality is assessed using intrusive measures such as segmental signal-to-noise ratio (SSNR), log-spectral distortion (LSD), frequency-weighted segmental signal-to-noise ratio (FWSNR), bark spectral distortion (BSD), and perceptual evaluation of speech quality (PESQ). For BSD, high correlation coefficients (0.57-0.99) are determined for rooms of all sizes and an increase in the correlation coefficient with the room size increase is found, which can be explained by a decrease in the density of early sound reflections. For FWSNR, high correlation (0.65-0.98) is determined for medium and large rooms. For PESQ, high correlation (0.96-0.99) is obtained for large classroom. SSNR and LSD are found to be uncorrelated with STI for rooms of all sizes.

show abstract

A Study on the Relationship between the Intelligibility and Quality of Algorithmically-Modified Speech for Normal Hearing Listeners

Cited by 16 publications

References 27 publications

Deep Learning Restores Speech Intelligibility in Multi-Talker Interference for Cochlear Implant Users

Deep Learning Restores Speech Intelligibility in Multi-Talker Interference for Cochlear Implant Users

Comparison of Speech Quality and Intelligibility Assessments in University Classrooms

Impact of University Classroom Size on the Relationship between Speech Quality and Intelligibility

Contact Info

Product

Resources

About