To study the perceptual structure of musical timbre and the effects of musical training, timbral dissimilarities of synthesized instrument sounds were rated by professional musicians, amateur musicians, and nonmusicians. The data were analyzed with an extended version of the multidimensional scaling algorithm CLASCAL (Winsberg & De Soete, 1993), which estimates the number of latent classes of subjects, the coordinates of each timbre on common Euclidean dimensions, a specificity value of unique attributes for each timbre, and a separate weight for each latent class on each of the common dimensions and the set of specificities. Five latent classes were found for a three-dimensional spatial model with specificities. Common dimensions were quantified psychophysically in terms of log-rise time, spectral centroid, and degree of spectral variation. The results further suggest that musical timbres possess specific attributes not accounted for by these shared perceptual dimensions. Weight patterns indicate that perceptual salience of dimensions and specificities varied across classes. A comparison of class structure with biographical factors associated with degree of musical training and activity was not clearly related to the class structure, though musicians gave more precise and coherent judgments than did non-musicians or amateurs. The model with latent classes and specificities gave a better fit to the data and made the acoustic correlates of the common dimensions more interpretable.
The analysis of musical signals to extract audio descriptors that can potentially characterize their timbre has been disparate and often too focused on a particular small set of sounds. The Timbre Toolbox provides a comprehensive set of descriptors that can be useful in perceptual research, as well as in music information retrieval and machine-learning approaches to content-based retrieval in large sound databases. Sound events are first analyzed in terms of various input representations (short-term Fourier transform, harmonic sinusoidal components, an auditory model based on the equivalent rectangular bandwidth concept, the energy envelope). A large number of audio descriptors are then derived from each of these representations to capture temporal, spectral, spectrotemporal, and energetic properties of the sound events. Some descriptors are global, providing a single value for the whole sound event, whereas others are time-varying. Robust descriptive statistics are used to characterize the time-varying descriptors. To examine the information redundancy across audio descriptors, correlational analysis followed by hierarchical clustering is performed. This analysis suggests ten classes of relatively independent audio descriptors, showing that the Timbre Toolbox is a multidimensional instrument for the measurement of the acoustical structure of complex sound signals.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.