Perceptually Salient Regions of the Modulation Power Spectrum for Musical Instrument Identification

Thoret, Etienne; Depalle, Philippe; McAdams, Stephen

doi:10.3389/fpsyg.2017.00587

Cited by 17 publications

(17 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Future work will further verify the approach on other datasets with playing techniques, such as Studio-Online [6] and ConTimbre [18]. We will also compare the jTFST with other equivalent time-frequency representations, such as the two-dimensional Fourier transform and the modulation spectra [19].…”

Section: Resultsmentioning

confidence: 99%

Playing Technique Recognition by Joint Time–Frequency Scattering

Wang

Lostanlen

Benetos

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Playing techniques are important expressive elements in music signals. In this paper, we propose a recognition system based on the joint time-frequency scattering transform (jTFST) for pitch evolution-based playing techniques (PETs), a group of playing techniques with monotonic pitch changes over time. The jTFST represents spectro-temporal patterns in the time-frequency domain, capturing discriminative information of PETs. As a case study, we analyse three commonly used PETs of the Chinese bamboo flute: acciacatura, portamento, and glissando, and encode their characteristics using the jTFST. To verify the proposed approach, we create a new dataset, the CBF-petsDB, containing PETs played in isolation as well as in the context of whole pieces performed and annotated by professional players. Feeding the jTFST to a machine learning classifier, we obtain F-measures of 71% for acciacatura, 59% for portamento, and 83% for glissando detection, and provide explanatory visualisations of scattering coefficients for each technique.

show abstract

Section: Resultsmentioning

confidence: 99%

Playing Technique Recognition by Joint Time–Frequency Scattering

Wang

Lostanlen

Benetos

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…In the visual domain, these techniques have been extended in recent years to address not only low-level sensory processes, but higher-level cognitive mechanisms in humans: facial recognition [12], emotional expressions [2, 13], social traits [14], as well as their associated individual and cultural variations ([15]; for a review, see [5]). In speech, even more recently, reverse correlation and the associated “bubbles” technique were used to study spectro-temporal regions underlying speech intelligibility [16, 17] or phoneme discrimination in noise [18, 19] and, in music, timbre recognition of musical instruments [20, 21].…”

Section: Introductionmentioning

confidence: 99%

CLEESE: An open-source audio-transformation toolbox for data-driven experiments in speech and music cognition

Burred¹,

Ponsot²,

Goupil³

et al. 2019

PLoS ONE

View full text Add to dashboard Cite

Over the past few years, the field of visual social cognition and face processing has been dramatically impacted by a series of data-driven studies employing computer-graphics tools to synthesize arbitrary meaningful facial expressions. In the auditory modality, reverse correlation is traditionally used to characterize sensory processing at the level of spectral or spectro-temporal stimulus properties, but not higher-level cognitive processing of e.g. words, sentences or music, by lack of tools able to manipulate the stimulus dimensions that are relevant for these processes. Here, we present an open-source audio-transformation toolbox, called CLEESE, able to systematically randomize the prosody/melody of existing speech and music recordings. CLEESE works by cutting recordings in small successive time segments (e.g. every successive 100 milliseconds in a spoken utterance), and applying a random parametric transformation of each segment’s pitch, duration or amplitude, using a new Python-language implementation of the phase-vocoder digital audio technique. We present here two applications of the tool to generate stimuli for studying intonation processing of interrogative vs declarative speech, and rhythm processing of sung melodies.

show abstract

“…From a more general perspective, the current approach is in line with an upsurge of interest in signal analysis/re-synthesis approaches to the study of auditory perception (Mc-Dermott and Simoncelli, 2011;Overath et al, 2015;Ponsot et al, 2018;Thoret et al, 2017).…”

Section: Discussionmentioning

confidence: 86%

“…Most recently, Thoret et al (2016Thoret et al ( , 2017 showed that instrument identification is determined by specific instrument-specific spectrotemporal modulations, although their approach did not allow them to draw specific conclusions about the role of onsets. Ogg et al (2017) studied the minimal duration required to discriminate between musical instrument sounds, human speech, and human environmental sounds.…”

Section: A Previous Researchmentioning

confidence: 99%