2022
DOI: 10.1186/s13636-022-00245-8
|View full text |Cite
|
Sign up to set email alerts
|

Transformer-based ensemble method for multiple predominant instruments recognition in polyphonic music

Abstract: Multiple predominant instrument recognition in polyphonic music is addressed using decision level fusion of three transformer-based architectures on an ensemble of visual representations. The ensemble consists of Mel-spectrogram, modgdgram, and tempogram. Predominant instrument recognition refers to the problem where the prominent instrument is identified from a mixture of instruments being played together. We experimented with two transformer architectures like Vision transformer (Vi-T) and Shifted window tra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
0
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 46 publications
(67 reference statements)
0
0
0
Order By: Relevance
“…In their work, they propose a method to recognize both pitches and instruments [16]. To augment the data, they employed a Wave Generative Adversarial Network (WaveGAN) architecture to generate audio files [7][8][9]. These approaches demonstrate the utilization of various techniques, including feature extraction, deep learning, image processing, and data augmentation, to improve instrument recognition accuracy and handle challenges such as low-quality recordings and polyphonic music.…”
Section: Instrument Recognitionmentioning
confidence: 99%
See 2 more Smart Citations
“…In their work, they propose a method to recognize both pitches and instruments [16]. To augment the data, they employed a Wave Generative Adversarial Network (WaveGAN) architecture to generate audio files [7][8][9]. These approaches demonstrate the utilization of various techniques, including feature extraction, deep learning, image processing, and data augmentation, to improve instrument recognition accuracy and handle challenges such as low-quality recordings and polyphonic music.…”
Section: Instrument Recognitionmentioning
confidence: 99%
“…F1 Micro F1 Macro SVM [25] 0.36 0.27 Bosch et al [26] 0.50 0.43 MTF-DNN (2018) [27] 0.32 0.28 Audio DNN [28] 0.55 0.51 ConvNet (2017) [15] 0.62 0.52 Muti-task ConvNet (2020) [10] 0.66 0.58 Kratimenos et al (2021) [14] 0.65 0.55 WaveGAN ConvNet (2021) [7] 0.65 0.60 Voting-Swin-T (2022) [8] 0 According to the results presented in Table 5, the proposed AEDCN model outperforms other neural network-based models as well as data augmentation-based instrument recognition models.…”
Section: Modelmentioning
confidence: 99%
See 1 more Smart Citation