Environment Sound Classification Based on Visual Multi-Feature Fusion and GRU-AWS

Peng, Ning Song; Chen, Aibin; Zhou, Guoxiong; Chen, Wenjie; Liu, Jing; Ding, Fubo

doi:10.1109/access.2020.3032226

Cited by 18 publications

(6 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The performance of the model has seven evaluators: accuracy (14), sensitivity (15), specificity (16), precision (17), the f1score ( 18), cohen's kappa (19), and the matthews correlation coefficient (MCC) (20). The model was assessed using the evaluation index.…”

Section: ) Model Evaluationmentioning

confidence: 99%

Lightweight Skip Connections With Efficient Feature Stacking for Respiratory Sound Classification

Choi

Lee

et al. 2022

IEEE Access

View full text Add to dashboard Cite

As the number of deaths from respiratory diseases due to COVID-19 and infectious diseases increases, early diagnosis is necessary. In general, the diagnosis of diseases is based on imaging devices (e.g., computed tomography and magnetic resonance imaging) as well as the patient's underlying disease information. However, these examinations are time-consuming, incur considerable costs, and in a situation like the ongoing pandemic, face-to-face examinations are difficult to conduct. Therefore, we propose a lung disease classification model based on deep learning using non-contact auscultation. In this study, two respiratory specialists collected normal respiratory sounds and five types of abnormal sounds associated with lung disease, including those associated with four lung lesions in the left and right anterior chest and left and right posterior chest. For preprocessing and feature extraction, the noise was removed using three pass filters (low, band, and high), and respiratory sound features were extracted using the Log-Mel Spectrogram-Mel Frequency Cepstral Coefficient followed by feature stacking. Then, we propose a lung disease classification model of dense lightweight convolutional neural network-bidirectional gated recurrent unit skip connections using depthwise separable convolution based on the extracted respiratory sound information. The performance of the classification model was compared with both the baseline and the lightweight models. The results indicate that the proposed model achieves high performance and has an accuracy of 92.3%, sensitivity of 92.1%, specificity of 98.5%, and f1-score of 91.9%. Using the proposed model, we aim to contribute to the early detection of diseases during the COVID-19 pandemic.

show abstract

Section: ) Model Evaluationmentioning

confidence: 99%

Lightweight Skip Connections With Efficient Feature Stacking for Respiratory Sound Classification

Choi

Lee

et al. 2022

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Our feature engineering process was derived from reference [ 31 ]. Fusing of multi-spectrogram features as one new feature has been proposed to improve sound recognition accuracy [ 31 ]. A total of three features were extracted.…”

Section: Methodsmentioning

confidence: 99%

Efficiently Classifying Lung Sounds through Depthwise Separable CNN Models with Fused STFT and MFCC Features

Jung

Liao

et al. 2021

Diagnostics

View full text Add to dashboard Cite

Lung sounds remain vital in clinical diagnosis as they reveal associations with pulmonary pathologies. With COVID-19 spreading across the world, it has become more pressing for medical professionals to better leverage artificial intelligence for faster and more accurate lung auscultation. This research aims to propose a feature engineering process that extracts the dedicated features for the depthwise separable convolution neural network (DS-CNN) to classify lung sounds accurately and efficiently. We extracted a total of three features for the shrunk DS-CNN model: the short-time Fourier-transformed (STFT) feature, the Mel-frequency cepstrum coefficient (MFCC) feature, and the fused features of these two. We observed that while DS-CNN models trained on either the STFT or the MFCC feature achieved an accuracy of 82.27% and 73.02%, respectively, fusing both features led to a higher accuracy of 85.74%. In addition, our method achieved 16 times higher inference speed on an edge device and only 0.45% less accuracy than RespireNet. This finding indicates that the fusion of the STFT and MFCC features and DS-CNN would be a model design for lightweight edge devices to achieve accurate AI-aided detection of lung diseases.

show abstract

“…In the task of fusing the enhanced front-end and identifying the back-end, the input features from the enhanced front-end or clean sound order are selected according to a certain probability distribution. At the initial stage of training, because the performance of the enhanced model is not improved, the features of the input back-end recognition may not be able to better represent the audio information, leading to difficulties in model convergence [ 23 ]. Using the feature of clean sequence can correct the model, reduce the divergence of the model, and speed up the convergence.…”

Section: Algorithm Designmentioning

confidence: 99%

Analysis of Two-Piano Teaching Assistant Training Based on Neural Network Model Sound Sequence Recognition

Dai¹

2022

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

In today’s society, with the gradual improve5ment of material living standards, people are also more in pursuit of their own spiritual enjoyment. The study of piano has gradually become a way for people to enrich their spiritual life, and more and more people attach importance to it. In the field of piano teaching, the two-piano method is a unique form of playing the piano. In order to solve the problem that the recognition accuracy of the sequence of two pianos is seriously reduced in the environment of noise and reverberation, this paper proposes an auxiliary training analysis system based on the neural network model. Firstly, in order to learn the nonlinear relationship between the sound order and the target task label from the massive data, a multitask preprocessing method combining speech enhancement and detection is used to supervise the deep neural network training. Then, convolutional neural network is used to construct the end-to-end recognition system, and the initial recognition results are checked and corrected by the phonological sequence model. Finally, the sequence recognition is carried out under the condition of noise, and the articulation is improved by speech enhancement front-end module, and then the sequence recognition model is used for recognition. Compared with traditional training methods, it is proved that our method is effective in improving the training efficiency and performance quality of players. At the same time, this method breaks through the limitation of traditional training method of double piano, creates a more scientific training means, and realizes the practice and application of artificial intelligence technology in the teaching of double piano.

show abstract

Environment Sound Classification Based on Visual Multi-Feature Fusion and GRU-AWS

Cited by 18 publications

References 42 publications

Lightweight Skip Connections With Efficient Feature Stacking for Respiratory Sound Classification

Lightweight Skip Connections With Efficient Feature Stacking for Respiratory Sound Classification

Efficiently Classifying Lung Sounds through Depthwise Separable CNN Models with Fused STFT and MFCC Features

Analysis of Two-Piano Teaching Assistant Training Based on Neural Network Model Sound Sequence Recognition

Contact Info

Product

Resources

About