Masked Conditional Neural Networks for Audio Classification

Medhat, Fady; Chesmore, David

doi:10.1007/978-3-319-68612-7_40

Cited by 14 publications

(25 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Probabilistic techniques combined with deep learning have been explored too. Medhat et al (2017) propose a type of neural networks designed for temporal signal recognition, the Conditional Neural Network and the Masked Conditional Neural Network achieving accuracy levels between 85% and 86% on the GTZAN and ISMIR2004 datasets.…”

Section: Content-based Classificationmentioning

confidence: 99%

Machine learning for music genre: multifaceted review and experimentation with audioset

Ramírez

Flores

2019

J Intell Inf Syst

View full text Add to dashboard Cite

Music genre classification is one of the sub-disciplines of music information retrieval (MIR) with growing popularity among researchers, mainly due to the already open challenges. Although research has been prolific in terms of number of published works, the topic still suffers from a problem in its foundations: there is no clear and formal definition of what genre is. Music categorizations are vague and unclear, suffering from human subjectivity and lack of agreement. In its first part, this paper offers a survey trying to cover the many different aspects of the matter. Its main goal is give the reader an overview of the history and the current state-of-the-art, exploring techniques and datasets used to the date, as well as identifying current challenges, such as this ambiguity of genre definitions or the introduction of human-centric approaches. The paper pays special attention to new trends in machine learning applied to the music annotation problem. Finally, we also include a music genre classification experiment that compares different machine learning models using Audioset.

show abstract

Section: Content-based Classificationmentioning

confidence: 99%

Machine learning for music genre: multifaceted review and experimentation with audioset

Ramírez

Flores

2019

J Intell Inf Syst

View full text Add to dashboard Cite

show abstract

“…The ConditionaL Neural Network (CLNN) [13] is a discriminative model designed for temporal signals. The CLNN extends from the visible to hidden links proposed in the CRBM.…”

Section: Conditional Neural Networkmentioning

confidence: 99%

“…The models we discuss in this work have been considered in [13] for music genre classification with more emphasis on the influence of the data split (training set, validation set, and testing set) on the reported accuracies in the literature. In this work, we evaluate the applicability of the models to sounds of a different nature, i.e.…”

Section: Introductionmentioning

confidence: 99%

Masked Conditional Neural Networks for Automatic Sound Events Recognition

Medhat

Chesmore

2017

2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA)

Self Cite

View full text Add to dashboard Cite

Abstract-Deep neural network architectures designed for application domains other than sound, especially image recognition, may not optimally harness the time-frequency representation when adapted to the sound recognition problem. In this work, we explore the ConditionaL Neural Network (CLNN) and the Masked ConditionaL Neural Network (MCLNN) for multi-dimensional temporal signal recognition. The CLNN considers the inter-frame relationship, and the MCLNN enforces a systematic sparseness over the network's links to enable learning in frequency bands rather than bins allowing the network to be frequency shift invariant mimicking a filterbank. The mask also allows considering several combinations of features concurrently, which is usually handcrafted through exhaustive manual search. We applied the MCLNN to the environmental sound recognition problem using the ESC-10 and ESC-50 datasets. MCLNN achieved competitive performance, using 12% of the parameters and without augmentation, compared to state-of-the-art Convolutional Neural Networks.

show abstract

“…Weight sharing makes the CNN translation invariant, which does not preserve the spatial locality of the learned features. The ConditionaL Neural Networks (CLNN) [17] and its variant the Masked ConditionaL Neural Network (MCLNN) [17] are developed from the ground up exploiting the nature of the sound signal. The CLNN considers the interframes relation in a temporal signal and the MCLNN embeds a filterbank-like behavior that enables individual bands and suppresses others through an enforced systematic sparseness.…”

Section: Introductionmentioning

confidence: 99%

Masked Conditional Neural Networks for Environmental Sound Classification

Medhat¹,

Chesmore²

2017

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

The ConditionaL Neural Network (CLNN) exploits the nature of the temporal sequencing of the sound signal represented in a spectrogram, and its variant the Masked ConditionaL Neural Network (MCLNN) 1 induces the network to learn in frequency bands by embedding a filterbank-like sparseness over the network's links using a binary mask. Additionally, the masking automates the exploration of different feature combinations concurrently analogous to handcrafting the optimum combination of features for a recognition task. We have evaluated the MCLNN performance using the Urbansound8k dataset of environmental sounds. Additionally, we present a collection of manually recorded sounds for rail and road traffic, YorNoise, to investigate the confusion rates among machine generated sounds possessing low-frequency components. MCLNN has achieved competitive results without augmentation and using 12% of the trainable parameters utilized by an equivalent model based on state-of-theart Convolutional Neural Networks on the Urbansound8k. We extended the Ur-bansound8k dataset with YorNoise, where experiments have shown that common tonal properties affect the classification performance.

show abstract

Masked Conditional Neural Networks for Audio Classification

Cited by 14 publications

References 19 publications

Machine learning for music genre: multifaceted review and experimentation with audioset

Machine learning for music genre: multifaceted review and experimentation with audioset

Masked Conditional Neural Networks for Automatic Sound Events Recognition

Masked Conditional Neural Networks for Environmental Sound Classification

Contact Info

Product

Resources

About