2015 23rd European Signal Processing Conference (EUSIPCO) 2015
DOI: 10.1109/eusipco.2015.7362358
|View full text |Cite
|
Sign up to set email alerts
|

Deep neural networks for audio scene recognition

Abstract: These last years, artificial neural networks (ANN) have known a renewed interest since efficient training procedures have emerged to learn the so called deep neural networks (DNN), i.e. ANN with at least two hidden layers. In the same time, the computational auditory scene recognition (CASR) problem which consists in estimating the environment around a device from the received audio signal has been investigated. Most of works which deal with the CASR problem have tried to find well-adapted features for this pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
21
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 32 publications
(21 citation statements)
references
References 14 publications
(28 reference statements)
0
21
0
Order By: Relevance
“…Since the results on the LITIS dataset were reported with different metrics, i.e. average class-wise precision [23,19], average class-wise F1-score [4,5], and overall accuracy [4,28], we provide our performance on all of these metrics to make a proper comparison. We also would like to notice that although there exists other works on the DCASE dataset after the challenge, we only mention here those with performance equivalent or higher than that of the best submission in the challenge.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Since the results on the LITIS dataset were reported with different metrics, i.e. average class-wise precision [23,19], average class-wise F1-score [4,5], and overall accuracy [4,28], we provide our performance on all of these metrics to make a proper comparison. We also would like to notice that although there exists other works on the DCASE dataset after the challenge, we only mention here those with performance equivalent or higher than that of the best submission in the challenge.…”
Section: Resultsmentioning
confidence: 99%
“…Due to its complex sound composition, it is challenging to obtain a good representation for classification. Different features adapted from the related problems, such as speech recognition and audio event classification, have been used to characterize an acoustic scene, for instance MFCC [19,24] and Gammatone filters [25]. Some hand-crafted features tailored for the task have also been proposed and demonstrated good performance, like Histogram of Oriented Gradients (HOG) Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.…”
Section: Introductionmentioning
confidence: 99%
“…linear prediction coefficients (LPC)), and cepstral features (e.g. MFCCs, Gammatone ceptral coefficients) have been prevalent in the literature [3], [8], [15], [20]. As an improvement, Roma et al utilized recursive quantitative analyzing (RQA) to analyze the recurrent behaviour in the MFCC coefficients over time [30].…”
Section: Related Workmentioning
confidence: 99%
“…In order to automatically recognize a scene, a proper feature representation is needed, which, unfortunately, is not easily obtained due to the complexity of the content. Different lowlevel features have been proposed in prior works, such as Mel frequency cepstral coefficients (MFCCs) [8], [9] and Gammatone filterbank coefficients [10]. These features are usually borrowed from related problems like speech recognition and audio event classification.…”
Section: Introductionmentioning
confidence: 99%
“…Deep neural networks (DNN) have successfully applied for single modality such us text [28]- [30], images [31]- [33] and audio [34], [35] showing their ability to learn representations directly from raw data and can be used to extract a set of discriminative features. CNN is one powerful deep architecture of DNN commonly utilized for image classification [36]- [38].…”
Section: Related Workmentioning
confidence: 99%