Polyphonic Sound Event Detection by Using Capsule Neural Networks

Vesperini, Fabio; Gabrielli, Leonardo; Principi, Emanuele; Squartini, Stefano

doi:10.1109/jstsp.2019.2902305

Cited by 52 publications

(40 citation statements)

References 34 publications

(46 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To overcome the shortcomings of traditional deep learning networks, Hinton group (Sabour et al, 2017) proposed new deep learning architectures known as capsule networks (CapsNets), which introduced a novel building block that is used in deep learning to improve the model hierarchical relationships inside the internal knowledge representation of a neural network. CapsNets have shown great potential in some fields (Xi et al, 2017;Afshar et al, 2018;Lalonde and Bagci, 2018;Qiao et al, 2018;Vesperini et al, 2018;Zhao et al, 2018;Wang et al, 2019b;Peng et al, 2019). However, CapsNets have not yet been applied to drug discovery-related studies.…”

Section: Introductionmentioning

confidence: 99%

Capsule Networks Showed Excellent Performance in the Classification of hERG Blockers/Nonblockers

et al. 2020

View full text Add to dashboard Cite

Capsule networks (CapsNets), a new class of deep neural network architectures proposed recently by Hinton et al., have shown a great performance in many fields, particularly in image recognition and natural language processing. However, CapsNets have not yet been applied to drug discovery-related studies. As the first attempt, we in this investigation adopted CapsNets to develop classification models of hERG blockers/ nonblockers; drugs with hERG blockade activity are thought to have a potential risk of cardiotoxicity. Two capsule network architectures were established: convolution-capsule network (Conv-CapsNet) and restricted Boltzmann machine-capsule networks (RBM-CapsNet), in which convolution and a restricted Boltzmann machine (RBM) were used as feature extractors, respectively. Two prediction models of hERG blockers/nonblockers were then developed by Conv-CapsNet and RBM-CapsNet with the Doddareddy's training set composed of 2,389 compounds. The established models showed excellent performance in an independent test set comprising 255 compounds, with prediction accuracies of 91.8 and 92.2% for Conv-CapsNet and RBM-CapsNet models, respectively. Various comparisons were also made between our models and those developed by other machine learning methods including deep belief network (DBN), convolutional neural network (CNN), multilayer perceptron (MLP), support vector machine (SVM), k-nearest neighbors (kNN), logistic regression (LR), and LightGBM, and with different training sets. All the results showed that the models by Conv-CapsNet and RBM-CapsNet are among the best classification models. Overall, the excellent performance of capsule networks achieved in this investigation highlights their potential in drug discoveryrelated studies.

show abstract

Section: Introductionmentioning

confidence: 99%

Capsule Networks Showed Excellent Performance in the Classification of hERG Blockers/Nonblockers

et al. 2020

View full text Add to dashboard Cite

show abstract

“…Vesperini et al [80] proposed Capsule Neural Network (CapsNet) for polyphonic SED. The introduction of CapsNet is to overcome some limitations of CNN, in particular, the loss of information due to max-pooling operator [81].…”

Section: Figure 9 Difference Between Multi-label and Combined Singlementioning

confidence: 99%

“…Using CapsNet, Vesperini et al [80] achieved an ER of 0.36 on TUT-SED 2016 and TUT-SED 2017 development dataset using a binaural spectrogram as the input. On the other hand, results for TUT-SED 2017 evaluation dataset show that CapsNet using log mel energies achieved the lowest ER of 0.58 instead of using a binaural spectrogram as input.…”

Section: Figure 9 Difference Between Multi-label and Combined Singlementioning

confidence: 99%

A Comprehensive Review of Polyphonic Sound Event Detection

Chan

Chin

2020

IEEE Access

View full text Add to dashboard Cite

One of the most amazing functions of the human auditory system is the ability to detect all kinds of sound events in the environment. With the technologies and hardware advances, polyphonic Sound Event Detection (SED) can be developed to mimic the ability of the human auditory system. However, the development of a SED system is no trivial task, and several different factors often hinder accuracy. Although there are several overview papers available, most of them only provide a theoretical overview of algorithms used with little discussion. Thus, to the best of the authors' knowledge, there is no comprehensive review that covers this particular domain. Therefore, this paper aims to provide an in-depth discussion of different methodologies proposed by various authors that include the features used, detection algorithms, and their corresponding accuracy and limitations. Additional information on possible trends is also discussed that can be useful for future development works.

show abstract

“…With CNN's powerful capabilities of learning features and the property of equivariance of CapsNet, Conv-Caps has an advanced performance. Otherwise, CapsNet has been successfully applied to many fields, such as tumors classification [14], sound event detection [15], and remote sensing image classification [16]. The main idea of the CapsNet is that vector capsules are utilized to represent internal attributes, and replace the neuron in the traditional neural network with a set of neurons as a capsule to solve the problem of spatial hierarchies between features effectively.…”

Section: Introductionmentioning

confidence: 99%

Hyperspectral Image Classification With CapsNet and Markov Random Fields

et al. 2020

View full text Add to dashboard Cite

Hyperspectral image (HSI) classification is one of the most challenging problems in understanding HSI. Convolutional neural network(CNN), with the strong ability to extract features using the hidden layers in the network, has been introduced to solve this problem. However, several fully connected layers are always appended at the end of CNN, which dramatically reduced the efficiency of space utilization and make the classification algorithm hard to converge. Recently, a new network architecture called capsule network (CapsNet) was presented to improve the CNN. It uses groups of neurons as capsules to replace the neurons in traditional neural networks. Since the capsule can provide superior spectral features and spatial information extracted, its performance is better than the most advanced CNN in some fields. Motivated by this idea, a new remote sensing hyperspectral image classification algorithm called Conv-Caps is proposed to make full use of the advantages of both. We integrate spectral and spatial information into the proposed framework and combine Conv-Caps with Markov Random Field (MRF), which uses the graph cut expansion method to solve the classification task. The Caps-MRF method is further proposed. First, select an initial feature extractor,which a CNN without fully connected layers. Then, the initial recognition feature map is put into the newly designed CapsNet to obtain the probability map. Finally, the MRF model is used to calculate the subdivision labels. The presented method is trained with three real HSI datasets and is compared with the latest methods. We find the framework can produce competitive classification performance.

show abstract

Polyphonic Sound Event Detection by Using Capsule Neural Networks

Cited by 52 publications

References 34 publications

Capsule Networks Showed Excellent Performance in the Classification of hERG Blockers/Nonblockers

Capsule Networks Showed Excellent Performance in the Classification of hERG Blockers/Nonblockers

A Comprehensive Review of Polyphonic Sound Event Detection

Hyperspectral Image Classification With CapsNet and Markov Random Fields

Contact Info

Product

Resources

About