ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414551
|View full text |Cite
|
Sign up to set email alerts
|

Multi-View Audio And Music Classification

Abstract: We propose in this work a multi-view learning approach for audio and music classification. Considering four typical low-level representations (i.e. different views) commonly used for audio and music recognition tasks, the proposed multi-view network consists of four subnetworks, each handling one input types. The learned embedding in the subnetworks are then concatenated to form the multi-view embedding for classification similar to a simple concatenation network. However, apart from the joint classification b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
2

Relationship

3
6

Authors

Journals

citations
Cited by 17 publications
(5 citation statements)
references
References 21 publications
(30 reference statements)
0
5
0
Order By: Relevance
“…As using ensemble is a rule of thumb to improve the ASC performance and shows effective to deal with the issue of mismatched recording devices [50], [16], [17], [51], [52], [53], [54], we therefore apply an ensemble of multiple spectrogram inputs in this paper. In particular, we use three spectrograms: log-Mel [36], Gammatone (Gam) [55], and Constant Q Transform (CQT) [36].…”
Section: B Further Improve Asc Performance By An Ensemble Of Multiple...mentioning
confidence: 99%
“…As using ensemble is a rule of thumb to improve the ASC performance and shows effective to deal with the issue of mismatched recording devices [50], [16], [17], [51], [52], [53], [54], we therefore apply an ensemble of multiple spectrogram inputs in this paper. In particular, we use three spectrograms: log-Mel [36], Gammatone (Gam) [55], and Constant Q Transform (CQT) [36].…”
Section: B Further Improve Asc Performance By An Ensemble Of Multiple...mentioning
confidence: 99%
“…In our experimentation, we only used the text and audio modalities. We extracted two views (low-level features) from the audio modality: the raw audio signal (Raw) and the Mel-scale spectrogram (MEL), as suggested in [26].…”
Section: Data Set Descriptionsmentioning
confidence: 99%
“…As applying an ensemble of either different types of input spectrograms [14], [15], [16], [17] or different learning models [18], [19], [20], [21], [22] has been a rule of thumb to enhance the performance of audio-based scene classification task performance, we therefore evaluate two ensemble methods, referred to as the multiple spectrogram strategy (e.g. Multiple spectrograms combines with one model) and the multiple model strategy (e.g.…”
Section: B Further Exploring Audio-based Frameworkmentioning
confidence: 99%