2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
DOI: 10.1109/cvpr.2018.00745
|View full text |Cite
|
Sign up to set email alerts
|

Abstract: Acoustic scene classification (ASC) is one of the most popular problems in the field of machine listening. The objective of this problem is to classify an audio clip into one of the predefined scenes using only the audio data. This problem has considerably progressed over the years in the different editions of DCASE. It usually has several subtasks that allow to tackle this problem with different approaches. The subtask presented in this report corresponds to a ASC problem that is constrained by the complexity… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
1,729
2
3

Year Published

2019
2019
2021
2021

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 13,752 publications
(1,738 citation statements)
references
References 31 publications
4
1,729
2
3
Order By: Relevance
“…In this experiment, 27 × 75 × 93 × 81 data were generated via the aforementioned preprocessing and data augmentation steps. In the first layer, we used 1 × 1 × 1 convolutional filters, which have been widely used in recent structural designs of convolutional neural networks (CNNs) because these filters increase nonlinearity without changing the receptive fields of the convolutional layer (Hu, Shen, & Sun, ; Iandola et al, ; Simonyan & Zisserman, ). These filters can generate temporal descriptors for each voxel of the volume of the fMRI, and their weights can be easily learnt by DNNs during training.…”
Section: Methodsmentioning
confidence: 99%
“…In this experiment, 27 × 75 × 93 × 81 data were generated via the aforementioned preprocessing and data augmentation steps. In the first layer, we used 1 × 1 × 1 convolutional filters, which have been widely used in recent structural designs of convolutional neural networks (CNNs) because these filters increase nonlinearity without changing the receptive fields of the convolutional layer (Hu, Shen, & Sun, ; Iandola et al, ; Simonyan & Zisserman, ). These filters can generate temporal descriptors for each voxel of the volume of the fMRI, and their weights can be easily learnt by DNNs during training.…”
Section: Methodsmentioning
confidence: 99%
“…If E T (W⊗X) ≤ E W⊗T (X) , the error does not dominate the benefits of augmenting features and therefore η is set to zero. If E f (T (W⊗X)) < E W⊗T (X) , f may learn extra information beneficial to the model performance, for example, the channel relationship like [20]. In this case, η is zero.…”
Section: The Proposed Methodsmentioning
confidence: 99%
“…With the higher performance requirements, the Zeiler and Fergus (ZF) [36] and the visual geometry group (VGG) [37] networks used in the original Faster R-CNN cannot meet the demand. Hence more CNN networks with better performance have been proposed in recent years, such as Residual Network (ResNet) [28], Dense Convolutional Network (DenseNet) [38], Squeeze-and-Excitation Network (SENet) [39], etc. Due to the limited on-board computing and storage resources, we adopt ResNet-50 as feature extraction network, which has a good performance and light weight.…”
Section: Feature Extractionmentioning
confidence: 99%