2015 23rd European Signal Processing Conference (EUSIPCO) 2015
DOI: 10.1109/eusipco.2015.7362591
|View full text |Cite
|
Sign up to set email alerts
|

Timbral modeling for music artist recognition using i-vectors

Abstract: Music artist (i.e., singer) recognition is a challenging task in Music Information Retrieval (MIR). The presence of different musical instruments, the diversity of music genres and singing techniques make the retrieval of artist-relevant information from a song difficult. Many authors tried to address this problem by using complex features or hybrid systems. In this paper, we propose new song-level timbre-related features that are built from frame-level MFCCs via so-called i-vectors. We report artist recogniti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 14 publications
(9 citation statements)
references
References 14 publications
0
9
0
Order By: Relevance
“…The exploited audio features are inspired by the fields of speech processing and music information retrieval (MIR) and by their successful application in MIR-related tasks, including music retrieval, music classification, and music recommendation (Knees and Schedl 2016). We investigate two kinds of audio features: (i) block-level features (Seyerlehner et al 2011) which consider chunks of the audio signal known as blocks and are therefore capable of exploiting temporal aspects of the signal; and (ii) i-vector features (Eghbal-Zadeh et al 2015) which are extracted at the level of audio segments using audio frames. Both approaches eventually model the feature at the level of the entire audio piece; by aggregating the individual feature vectors across time.…”
Section: Audio Featuresmentioning
confidence: 99%
See 1 more Smart Citation
“…The exploited audio features are inspired by the fields of speech processing and music information retrieval (MIR) and by their successful application in MIR-related tasks, including music retrieval, music classification, and music recommendation (Knees and Schedl 2016). We investigate two kinds of audio features: (i) block-level features (Seyerlehner et al 2011) which consider chunks of the audio signal known as blocks and are therefore capable of exploiting temporal aspects of the signal; and (ii) i-vector features (Eghbal-Zadeh et al 2015) which are extracted at the level of audio segments using audio frames. Both approaches eventually model the feature at the level of the entire audio piece; by aggregating the individual feature vectors across time.…”
Section: Audio Featuresmentioning
confidence: 99%
“…The framework can be decomposed into several stages: (i) Frame-level feature extraction MFCCs have proven to be useful features for many audio and music processing tasks (Logan et al 2000b;Ellis 2007;Eghbal-Zadeh et al 2015). They provide a compact representation of the spectral envelope are also a musically meaningful representation (Eghbal-Zadeh et al 2015), and are used to capture acoustic scenes (Eghbal-Zadeh et al 2016). Even though it is possible to use other features (Suh et al 2011), we avoid the challenges involved in feature engineering and instead focus on the timbral modeling technique.…”
Section: Block-level Featuresmentioning
confidence: 99%
“…Studies have mostly focused on feature extraction methods, which extract a singer's voice from an audio signal. The extracted features include Mel-frequency cepstral coefficient (MFCC) features [3]- [5], linear frequency cepstral coefficient features [6], [7], harmonic features [6], cepstrum-based features [8], GMM super-vectors [9], and i-vectors [10], [12]. Segmenting a singer's voice and musical instrument is itself a classification problem and is outside the scope of this study.…”
Section: Background and Related Workmentioning
confidence: 99%
“…Features. I-vectors were first introduced in the field of speaker verification [10], but recently they have also been successfully utilized for music similarity and music artist recognition tasks [12,13]. We build a Gaussian mixture model with 1,024 components on the entire pool of segment-level features of the development song set.…”
Section: I-vectors From Timbralmentioning
confidence: 99%