Deep learning techniques have been used recently to tackle the audio source separation problem. In this work, we propose to use deep fully convolutional denoising autoencoders (CDAEs) for monaural audio source separation. We use as many CDAEs as the number of sources to be separated from the mixed signal. Each CDAE is trained to separate one source and treats the other sources as background noise. The main idea is to allow each CDAE to learn suitable spectral-temporal filters and features to its corresponding source. Our experimental results show that CDAEs perform source separation slightly better than the deep feedforward neural networks (FNNs) even with fewer parameters than FNNs.Index Terms-Fully convolutional denoising autoencoders, single channel audio source separation, stacked convolutional autoencoders, deep convolutional neural networks, deep learning.
Conv. ReLU
Max pooling
Conv. ReLU
Max pooling
Up sample
Conv. ReLU
Up sample
Conv. ReLU
EncoderDecoder
ObjectivesThis study investigated the usefulness and performance of a two-stage attention-aware convolutional neural network (CNN) for the automated diagnosis of otitis media from tympanic membrane (TM) images.DesignA classification model development and validation study in ears with otitis media based on otoscopic TM images. Two commonly used CNNs were trained and evaluated on the dataset. On the basis of a Class Activation Map (CAM), a two-stage classification pipeline was developed to improve accuracy and reliability, and simulate an expert reading the TM images.Setting and participantsThis is a retrospective study using otoendoscopic images obtained from the Department of Otorhinolaryngology in China. A dataset was generated with 6066 otoscopic images from 2022 participants comprising four kinds of TM images, that is, normal eardrum, otitis media with effusion (OME) and two stages of chronic suppurative otitis media (CSOM).ResultsThe proposed method achieved an overall accuracy of 93.4% using ResNet50 as the backbone network in a threefold cross-validation. The F1 Score of classification for normal images was 94.3%, and 96.8% for OME. There was a small difference between the active and inactive status of CSOM, achieving 91.7% and 82.4% F1 scores, respectively. The results demonstrate a classification performance equivalent to the diagnosis level of an associate professor in otolaryngology.ConclusionsCNNs provide a useful and effective tool for the automated classification of TM images. In addition, having a weakly supervised method such as CAM can help the network focus on discriminative parts of the image and improve performance with a relatively small database. This two-stage method is beneficial to improve the accuracy of diagnosis of otitis media for junior otolaryngologists and physicians in other disciplines.
Deep neural networks (DNNs) are usually used for single channel source separation to predict either soft or binary time frequency masks. The masks are used to separate the sources from the mixed signal. Binary masks produce separated sources with more distortion and less interference than soft masks. In this paper, we propose to use another DNN to combine the estimates of binary and soft masks to achieve the advantages and avoid the disadvantages of using each mask individually. We aim to achieve separated sources with low distortion and low interference between each other. Our experimental results show that combining the estimates of binary and soft masks using DNN achieves lower distortion than using each estimate individually and achieves as low interference as the binary mask.
Wideband Absorbance Immittance (WAI) has been available for more than a decade, however its clinical use still faces the challenges of limited understanding and poor interpretation of WAI results. This study aimed to develop Machine Learning (ML) tools to identify the WAI absorbance characteristics across different frequency-pressure regions in the normal middle ear and ears with otitis media with effusion (OME) to enable diagnosis of middle ear conditions automatically. Data analysis included pre-processing of the WAI data, statistical analysis and classification model development, and key regions extraction from the 2D frequency-pressure WAI images. The experimental results show that ML tools appear to hold great potential for the automated diagnosis of middle ear diseases from WAI data. The identified key regions in the WAI provide guidance to practitioners to better understand and interpret WAI data and offer the prospect of quick and accurate diagnostic decisions.
Supervised multi-channel audio source separation requires extracting useful spectral, temporal, and spatial features from the mixed signals. The success of many existing systems is therefore largely dependent on the choice of features used for training. In this work, we introduce a novel multi-channel, multiresolution convolutional auto-encoder neural network that works on raw time-domain signals to determine appropriate multiresolution features for separating the singing-voice from stereo music. Our experimental results show that the proposed method can achieve multi-channel audio source separation without the need for hand-crafted features or any pre-or post-processing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.