A Two-Stage Approach to Device-Robust Acoustic Scene Classification

Hu, Hu; Yang, Chao-Han Huck; Xia, Xianjun; Bai, Xue; Tang, Xin; Wang, Yajian; Niu, Shutong; Chai, Lu; Li, Juanjuan; Zhu, Hongning; Bao, Feng; Zhao, Yuanjun; Siniscalchi, Sabato Marco; Wang, Yannan; Du, Jun; Lee, Chin‐Hui

doi:10.1109/icassp39728.2021.9414835

Cited by 30 publications

(7 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our baseline model was a smaller version of the ResNet model from [24]. The key elements of the Resnet structure are the Residual Blocks.…”

Section: Resnet Baseline Modelmentioning

confidence: 99%

“…For the DCASE 2020 task 1 dataset, we extend the baseline network from the DCASE 2020 [8] to work with binaural audio as our baseline for comparison. Meanwhile, in the DCASE 2021 dataset, as our baseline we selected the much more complex Residual network solution [24] that had a high performance on the dataset. To make the trade-off clear we limited the architecture changes, and our solutions are mainly achieved by replacing 2D-convolution operations in the baseline networks with our proposed decomposition.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Low-Complexity Acoustic Scene Classification Using Time Frequency Separable Convolution

Phan

Jones

2022

Electronics

View full text Add to dashboard Cite

Replacing 2D-convolution operations by depth-wise separable time and frequency convolutions greatly reduces the number of parameters while maintaining nearly equivalent performances in the context of acoustic scene classification. In our experiments, the models’ sizes can be reduced by 6 to 14 times with similar performances. For a 3-class audio classification, replacing 2D-convolution in a CNN model gives roughly a 2% increase in accuracy. In a 10-class audio classification with multiple recording devices, replacing 2D-convolution in Resnet only reduces around 1.5% of the accuracy.

show abstract

“…Our baseline model was a smaller version of the ResNet model from [24]. The key elements of the Resnet structure are the Residual Blocks.…”

Section: Resnet Baseline Modelmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Low-Complexity Acoustic Scene Classification Using Time Frequency Separable Convolution

Phan

Jones

2022

Electronics

View full text Add to dashboard Cite

show abstract

“…The primary goal has been to improve generalization on the underrepresented devices. Supervised machine learning algorithms have been proposed to account for the data imbalance problem and are often combined with data augmentation, regularization and fine tuning approaches [5][6][7]. As the dataset contains recordings captured This work was made with the support of the French National Research Agency, in the framework of the project LEAUDS "Learning to understand audio scenes" (ANR-18-CE23-0020).…”

Section: Introductionmentioning

confidence: 99%

On The Impact of Normalization Strategies in Unsupervised Adversarial Domain Adaptation for Acoustic Scene Classification

Olvera

Gasso

2022

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Acoustic scene classification systems face performance degradation when trained and tested on data recorded by different devices. Unsupervised domain adaptation methods have been studied to reduce the impact of this mismatch. While they do not assume the availability of labels at test time, they often exploit parallel data recorded by both devices, and thus are not fully blind to the target domain. In this paper, we address a more practical scenario where parallel data are not available. We thoroughly analyze the impact of normalization and moment matching strategies to compensate for the linear distortion introduced by the recording device and propose their integration with adversarial domain adaptation to handle the remaining non-linear distortion. Experiments on the DCASE Challenge 2018 Task 1B dataset show that the proposed integrated approach considerably reduces domain mismatch, reaching an accuracy in the target domain close to that obtained in the source domain.

show abstract

“…Focusing on the frequency normalization, authors in [20] proposed a novel Residual Normalization method and a residual-based network architecture, which showed effective to improve the ASC performance and achieved the top-1 on DCASE 2021 Task 1A blind Test set and the top-4 on DCASE 2021 Task 1A Development set. However, to achieve the best performance, some papers from the second approach have still applied ensemble methods of multiple models [21], [22], [23], [24], which increases the model complexity.…”

Section: Introductionmentioning

confidence: 99%

“…To deal with the issue of large footprint models as using complex network architectures, ensemble of multiple models, or ensemble of multiple spectrogram inputs, pruning [21], [25], [22], [26] and quantization [25], [23] techniques have been widely applied. While quantization techniques feasibly help the model reduce to 1/4 of the original size (i.e, 32 bit with floating point format presenting for 1 trainable parameter is quantized to 8 bit with integer format [27]), pruning techniques prove that models can be reduced to 1/10 of the original sizes [25], [26].…”

Section: Introductionmentioning

confidence: 99%

Robust, General, and Low Complexity Acoustic Scene Classification Systems and An Effective Visualization for Presenting a Sound Scene Context

Pham¹,

Salovic²,

Jalali³

et al. 2022

Preprint

View full text Add to dashboard Cite

In this paper, we present a comprehensive analysis of Acoustic Scene Classification (ASC), the task of identifying the scene of an audio recording from its acoustic signature.In particular, we firstly propose an inception-based and lowfootprint ASC model, referred to as the ASC baseline. The proposed ASC baseline is then compared with benchmark and high-complexity network architectures of MobileNetV1,

show abstract

A Two-Stage Approach to Device-Robust Acoustic Scene Classification

Cited by 30 publications

References 23 publications

Low-Complexity Acoustic Scene Classification Using Time Frequency Separable Convolution

Low-Complexity Acoustic Scene Classification Using Time Frequency Separable Convolution

On The Impact of Normalization Strategies in Unsupervised Adversarial Domain Adaptation for Acoustic Scene Classification

Robust, General, and Low Complexity Acoustic Scene Classification Systems and An Effective Visualization for Presenting a Sound Scene Context

Contact Info

Product

Resources

About