Speech enhancement with noise estimation and filtration using deep learning models

Kantamaneni, Sravanthi; Charles, A.; Babu, T. Ranga

doi:10.1016/j.tcs.2022.08.017

Cited by 10 publications

(5 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…From a more practical viewpoint, it is important to achieve highly accurate classification even for noisy speech recorded by low-cost devices. For this purpose, a noise reduction process [ 30 ] must be introduced in the preprocessing of the speech data. The third is model compression.…”

Section: Discussionmentioning

confidence: 99%

Classification of Depression and Its Severity Based on Multiple Audio Features Using a Graphical Convolutional Neural Network

Ishimaru

Okada

Uchiyama

et al. 2023

IJERPH

View full text Add to dashboard Cite

Audio features are physical features that reflect single or complex coordinated movements in the vocal organs. Hence, in speech-based automatic depression classification, it is critical to consider the relationship among audio features. Here, we propose a deep learning-based classification model for discriminating depression and its severity using correlation among audio features. This model represents the correlation between audio features as graph structures and learns speech characteristics using a graph convolutional neural network. We conducted classification experiments in which the same subjects were allowed to be included in both the training and test data (Setting 1) and the subjects in the training and test data were completely separated (Setting 2). The results showed that the classification accuracy in Setting 1 significantly outperformed existing state-of-the-art methods, whereas that in Setting 2, which has not been presented in existing studies, was much lower than in Setting 1. We conclude that the proposed model is an effective tool for discriminating recurring patients and their severities, but it is difficult to detect new depressed patients. For practical application of the model, depression-specific speech regions appearing locally rather than the entire speech of depressed patients should be detected and assigned the appropriate class labels.

show abstract

Section: Discussionmentioning

confidence: 99%

Classification of Depression and Its Severity Based on Multiple Audio Features Using a Graphical Convolutional Neural Network

Ishimaru

Okada

Uchiyama

et al. 2023

IJERPH

View full text Add to dashboard Cite

show abstract

“…To make the proposed model practical in a wider range of applications, it is important to achieve accurate severity prediction using noisy speech recorded with inexpensive devices or via telephone or video calls. To achieve this objective, it is necessary to introduce a noise reduction process [27] in the preprocessing of speech data. The introduction of a noise reduction process is expected to enable noise-robust depression diagnosis support not only in a face-to-face format but also in a remote format.…”

Section: Discussionmentioning

confidence: 99%

A New Regression Model for Depression Severity Prediction Based on Correlation among Audio Features Using a Graph Convolutional Neural Network

et al. 2023

View full text Add to dashboard Cite

Recent studies have revealed mutually correlated audio features in the voices of depressed patients. Thus, the voices of these patients can be characterized based on the combinatorial relationships among the audio features. To date, many deep learning–based methods have been proposed to predict the depression severity using audio data. However, existing methods have assumed that the individual audio features are independent. Hence, in this paper, we propose a new deep learning–based regression model that allows for the prediction of depression severity on the basis of the correlation among audio features. The proposed model was developed using a graph convolutional neural network. This model trains the voice characteristics using graph-structured data generated to express the correlation among audio features. We conducted prediction experiments on depression severity using the DAIC-WOZ dataset employed in several previous studies. The experimental results showed that the proposed model achieved a root mean square error (RMSE) of 2.15, a mean absolute error (MAE) of 1.25, and a symmetric mean absolute percentage error of 50.96%. Notably, RMSE and MAE significantly outperformed the existing state-of-the-art prediction methods. From these results, we conclude that the proposed model can be a promising tool for depression diagnosis.

show abstract

“…To achieve a similar objective, ref. [15] addresses the field of deep learning-based speech enhancement techniques, focusing on their real-time applications. Evaluating three popular models in terms of signal processing metrics, such as a signal-to-interference ratio, response time, and memory usage, the research offers valuable insights into the online viability of these methods.…”

Section: State Of the Artmentioning

confidence: 99%

“…Sci. 2024, 14, 740 2 of 15 The motivation for this research arises from the critical need to enhance the clarity and intelligibility of speech in various communication settings, where background noise often compromises the quality of the transmitted audio. While existing noise reduction techniques have made strides in mitigating this issue, our work aims to develop a deep learning model specifically tailored to suppress background noise across a range of simulated scenarios.…”

Section: Introductionmentioning

confidence: 99%

Analyzing the Influence of Diverse Background Noises on Voice Transmission: A Deep Learning Approach to Noise Suppression

Nogales,

Caracuel-Cayuela,

García-Tejedor

2024

Applied Sciences

View full text Add to dashboard Cite

This paper presents an approach to enhancing the clarity and intelligibility of speech in digital communications compromised by various background noises. Utilizing deep learning techniques, specifically a Variational Autoencoder (VAE) with 2D convolutional filters, we aim to suppress background noise in audio signals. Our method focuses on four simulated environmental noise scenarios: storms, wind, traffic, and aircraft. The training dataset has been obtained from public sources (TED-LIUM 3 dataset, which includes audio recordings from the popular TED-TALK series) combined with these background noises. The audio signals were transformed into 2D power spectrograms, upon which our VAE model was trained to filter out the noise and reconstruct clean audio. Our results demonstrate that the model outperforms existing state-of-the-art solutions in noise suppression. Although differences in noise types were observed, it was challenging to definitively conclude which background noise most adversely affects speech quality. The results have been assessed with objective (mathematical metrics) and subjective (listening to a set of audios by humans) methods. Notably, wind noise showed the smallest deviation between the noisy and cleaned audio, perceived subjectively as the most improved scenario. Future work should involve refining the phase calculation of the cleaned audio and creating a more balanced dataset to minimize differences in audio quality across scenarios. Additionally, practical applications of the model in real-time streaming audio are envisaged. This research contributes significantly to the field of audio signal processing by offering a deep learning solution tailored to various noise conditions, enhancing digital communication quality.

show abstract

Speech enhancement with noise estimation and filtration using deep learning models

Cited by 10 publications

References 31 publications

Classification of Depression and Its Severity Based on Multiple Audio Features Using a Graphical Convolutional Neural Network

Classification of Depression and Its Severity Based on Multiple Audio Features Using a Graphical Convolutional Neural Network

A New Regression Model for Depression Severity Prediction Based on Correlation among Audio Features Using a Graph Convolutional Neural Network

Analyzing the Influence of Diverse Background Noises on Voice Transmission: A Deep Learning Approach to Noise Suppression

Contact Info

Product

Resources

About