Randomly Weighted CNNs for (Music) Audio Classification

Pons, Jordi; Serra, Xavier

doi:10.1109/icassp.2019.8682912

Cited by 77 publications

(58 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this section, we describe the deep encoder/decoder architectures we used to explore the impact of increasing the Conv-TasNet encoder/decoder's capacity to represent more complex signal transformations. The core architecture we employ is motivated by recent research in audio classification in which waveform-based architectures built on a deep stack of small filters deliver very competitive results [20][21][22]. This research highlights the potential for these architectures to learn generalized patterns via hierarchically combining small-context representations [20].…”

Section: Deep Encoder / Decodermentioning

confidence: 99%

“…The core architecture we employ is motivated by recent research in audio classification in which waveform-based architectures built on a deep stack of small filters deliver very competitive results [20][21][22]. This research highlights the potential for these architectures to learn generalized patterns via hierarchically combining small-context representations [20]. For this reason, we investigate the possibilities of a deep encoder/decoder that is based on a stack of small filters with nonlinear activation functions.…”

Section: Deep Encoder / Decodermentioning

confidence: 99%

See 1 more Smart Citation

An Empirical Study of Conv-Tasnet

Kadıoğlu

Horgan

Liu

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

Conv-TasNet is a recently proposed waveform-based deep neural network that achieves state-of-the-art performance in speech source separation. Its architecture consists of a learnable encoder/decoder and a separator that operates on top of this learned space. Various improvements have been proposed to Conv-TasNet. However, they mostly focus on the separator, leaving its encoder/decoder as a (shallow) linear operator. In this paper, we conduct an empirical study of Conv-TasNet and propose an enhancement to the encoder/decoder that is based on a (deep) non-linear variant of it. In addition, we experiment with the larger and more diverse LibriTTS dataset and investigate the generalization capabilities of the studied models when trained on a much larger dataset. We propose cross-dataset evaluation that includes assessing separations from the WSJ0-2mix, Lib-riTTS and VCTK databases. Our results show that enhancements to the encoder/decoder can improve average SI-SNR performance by more than 1 dB. Furthermore, we offer insights into the generalization capabilities of Conv-TasNet and the potential value of improvements to the encoder/decoder.

show abstract

Section: Deep Encoder / Decodermentioning

confidence: 99%

Section: Deep Encoder / Decodermentioning

confidence: 99%

An Empirical Study of Conv-Tasnet

Kadıoğlu

Horgan

Liu

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

show abstract

“…The highest level of representation is then used for classifying the input signal by means of three fully connected layers. Experimental results on UrbanSound8k dataset, which contains 8,732 environmental sounds from 10 classes, have shown that the proposed approach outperforms other approaches based on 2D representations such as spectrograms (Piczak, 2015a;Pons & Serra, 2018;Salamon & Bello, 2015) by between 11.24% (SB-CNN) and 27.14% (VGG) in terms of mean accuracy. Furthermore, the proposed approach does not require data augmentation or any signal pre-processing for extracting features.…”

Section: Introductionmentioning

confidence: 97%

“…Recent works explore CNN-based approaches given the significant improvements over hand-crafted feature-based methods (Piczak, 2015a;Pons & Serra, 2018;Simonyan & Zisserman, 2014;. However, most of these approaches first convert the audio signal into a 2D representation (spectrogram) and use 2D CNN architectures that were originally designed for object recognition such as AlexNet and VGG (Simonyan & Zisserman, 2014).…”

Section: Introductionmentioning

confidence: 99%

“…They reported the classification accuracy of 79% on a dataset of environmental sounds (Salamon et al, 2014). Pons & Serra (2018) used randomly weighted 2D CNNs (non-trained) for extracting features from audio spectrograms and raw audio samples for sound classification. Several experiments have been conducted to find the best architectures for this method.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

End-to-end environmental sound classification using a 1D convolutional neural network

Abdoli

Cardinal

Koerich

2019

Expert Systems with Applications

247

128

View full text Add to dashboard Cite

In this paper, we present an end-to-end approach for environmental sound classification based on a 1D Convolution Neural Network (CNN) that learns a representation directly from the audio signal. Several convolutional layers are used to capture the signal's fine time structure and learn diverse filters that are relevant to the classification task. The proposed approach can deal with audio signals of any length as it splits the signal into overlapped frames using a sliding window. Different architectures considering several input sizes are evaluated, including the initialization of the first convolutional layer with a Gammatone filterbank that models the human auditory filter response in the cochlea. The performance of the proposed end-to-end approach in classifying environmental sounds was assessed on the UrbanSound8k dataset and the experimental results have shown that it achieves 89% of mean accuracy. Therefore, the propose approach outperforms most of the state-of-the-art approaches that use handcrafted features or 2D representations as input. Furthermore, the proposed approach has a small number of parameters compared to other architectures found in the literature, which reduces the amount of data required for training.

show abstract

Randomly initialized convolutional neural network for the recognition of COVID‐19 using X‐ray images

Atitallah

Driss

Boulila

et al. 2021

Int J Imaging Syst Tech

View full text Add to dashboard Cite

By the start of 2020, the novel coronavirus (COVID‐19) had been declared a worldwide pandemic, and because of its infectiousness and severity, several strands of research have focused on combatting its ongoing spread. One potential solution to detecting COVID‐19 rapidly and effectively is by analyzing chest X‐ray images using Deep Learning (DL) models. Convolutional Neural Networks (CNNs) have been presented as particularly efficient techniques for early diagnosis, but most still include limitations. In this study, we propose a novel randomly initialized CNN (RND‐CNN) architecture for the recognition of COVID‐19. This network consists of a set of differently‐sized hidden layers all created from scratch. The performance of this RND‐CNN is evaluated using two public datasets: the COVIDx and the enhanced COVID‐19 datasets. Each of these datasets consists of medical images (X‐rays) in one of three different classes: chests with COVID‐19, with pneumonia, or in a normal state. The proposed RND‐CNN model yields encouraging results for its accuracy in detecting COVID‐19 results, achieving 94% accuracy for the COVIDx dataset and 99% accuracy on the enhanced COVID‐19 dataset.

show abstract

Randomly Weighted CNNs for (Music) Audio Classification

Cited by 77 publications

References 39 publications

An Empirical Study of Conv-Tasnet

An Empirical Study of Conv-Tasnet

End-to-end environmental sound classification using a 1D convolutional neural network

Randomly initialized convolutional neural network for the recognition of COVID‐19 using X‐ray images

Contact Info

Product

Resources

About