2019
DOI: 10.48550/arxiv.1909.02869
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Exploiting Parallel Audio Recordings to Enforce Device Invariance in CNN-based Acoustic Scene Classification

Abstract: Distribution mismatches between the data seen at training and at application time remain a major challenge in all application areas of machine learning. We study this problem in the context of machine listening (Task 1b of the DCASE 2019 Challenge). We propose a novel approach to learn domain-invariant classifiers in an end-to-end fashion by enforcing equal hidden layer representations for domain-parallel samples, i.e. time-aligned recordings from different recording devices. No classification labels are neede… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 7 publications
(10 reference statements)
0
2
0
Order By: Relevance
“…The huge amounts of video data that are becoming more and more available through online sources, enable and at the same time require continuously better performance in tasks like activity recognition, video saliency, scene analysis or video summarization, imposing the need to exploit not only spatial information, but also temporal [7,47,27]. Similar advances have also been achieved in audio processing areas, such as acoustic event detection [44], speech recognition [17,9], sound localiza-tion [56], by using deep learning techniques.…”
Section: Introductionmentioning
confidence: 96%
“…The huge amounts of video data that are becoming more and more available through online sources, enable and at the same time require continuously better performance in tasks like activity recognition, video saliency, scene analysis or video summarization, imposing the need to exploit not only spatial information, but also temporal [7,47,27]. Similar advances have also been achieved in audio processing areas, such as acoustic event detection [44], speech recognition [17,9], sound localiza-tion [56], by using deep learning techniques.…”
Section: Introductionmentioning
confidence: 96%
“…In particular, spectrum correction [23] and channel conversion [24] build a front-end module to convert speech features from the source domain to target domain before feeding them to the back-end classifier. Besides front-end features, mid-level feature based transfer systems, which uses bottleneck features [25] or hidden layer representations [26] are adopted to transfer knowledge from source to target domain. Adversarial training methods in [27,28] leverage an extra domain discriminator to solve the device mismatch problem although the key focus is on lack of labeled target data.…”
Section: Introductionmentioning
confidence: 99%