Many signal processing-based methods for sound source direction-of-arrival estimation produce a spatial pseudospectrum of which the local maxima strongly indicate the source directions. Due to different levels of noise, reverberation and different number of overlapping sources, the spatial pseudospectra are noisy even after smoothing. In addition, the number of sources is often unknown. As a result, selecting the peaks from these spectra is susceptible to error. Convolutional neural network has been successfully applied to many image processing problems in general and direction-of-arrival estimation in particular. In addition, deep learning-based methods for direction-ofarrival estimation show good generalization to different environments. We propose to use a 2D convolutional neural network with multi-task learning to robustly estimate the number of sources and the directions-of-arrival from short-time spatial pseudospectra, which have useful directional information from audio input signals. This approach reduces the tendency of the neural network to learn unwanted association between sound classes and directional information, and helps the network generalize to unseen sound classes. The simulation and experimental results show that the proposed methods outperform other directionalof-arrival estimation methods in different levels of noise and reverberation, and different number of sources.
Augmented reality (AR), which composes of virtual and real world environments, is becoming one of the major topics of research interest due to the advent of wearable devices. Today, AR is commonly used as assistive display to enhance the perception of reality in education, gaming, navigation, sports, entertainment, simulators, etc. However, most of the past works have mainly concentrated on the visual aspects of AR. Auditory events are one of the essential components in human perceptions in daily life but the augmented reality solutions have been lacking in this regard till now compared to visual aspects. Therefore, there is a need of natural listening in AR systems to give a holistic experience to the user. A new headphones configuration is presented in this work with two pairs of binaural microphones attached to headphones (one internal and one external microphone on each side). This paper focuses on enabling natural listening using open headphones employing adaptive filtering techniques to equalize the headset such that virtual sources are perceived as close as possible to sounds emanating from the physical sources. This would also require a superposition of virtual sources with the physical sound sources, as well as ambience. Modified versions of the filtered-x normalized least mean square algorithm (FxNLMS) are proposed in the paper to converge faster to the optimum solution as compared to the conventional FxNLMS. Measurements are carried out with open structure type headphones to evaluate their performance. Subjective test was conducted using individualized binaural room impulse responses (BRIRs) to evaluate the perceptual similarity between real and virtual sounds.Index Terms-Adaptive filtering, augmented reality (AR), head related transfer function (HRTF), natural listening, spatial audio.
Manuscript
To mitigate the outbreak of highly contagious COVID-19, we need a sensitive, robust automated diagnostic tool. This paper proposes a three-level approach to separate the cases of COVID-19, pneumonia from normal patients using chest CT scans. At the first level, we fine tune a multi-scale ResNet50 model for feature extraction from all the slices of CT scan for each patient. By using multi-scale residual network, we can learn different sizes of infection, thereby making the detection possible at early stages too. These extracted features are used to train a patient-level classifier, at the second level. Four different classifiers are trained at this stage. Finally, predictions of patient level classifiers are combined by training an ensemble classifier. We test the proposed method on three sets of data released by ICASSP, COVID-19 Signal Processing Grand Challenge (SPGC). The proposed method has been successful in classifying the three classes with a validation accuracy of 94.9% and testing accuracy of 88.89%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.