The analysis of musical signals to extract audio descriptors that can potentially characterize their timbre has been disparate and often too focused on a particular small set of sounds. The Timbre Toolbox provides a comprehensive set of descriptors that can be useful in perceptual research, as well as in music information retrieval and machine-learning approaches to content-based retrieval in large sound databases. Sound events are first analyzed in terms of various input representations (short-term Fourier transform, harmonic sinusoidal components, an auditory model based on the equivalent rectangular bandwidth concept, the energy envelope). A large number of audio descriptors are then derived from each of these representations to capture temporal, spectral, spectrotemporal, and energetic properties of the sound events. Some descriptors are global, providing a single value for the whole sound event, whereas others are time-varying. Robust descriptive statistics are used to characterize the time-varying descriptors. To examine the information redundancy across audio descriptors, correlational analysis followed by hierarchical clustering is performed. This analysis suggests ten classes of relatively independent audio descriptors, showing that the Timbre Toolbox is a multidimensional instrument for the measurement of the acoustical structure of complex sound signals.
The influence of listener's expertise and sound identification on the categorization of environmental sounds is reported in three studies. In Study 1, the causal uncertainty of 96 sounds was measured by counting the different causes described by 29 participants. In Study 2, 15 experts and 15 nonexperts classified a selection of 60 sounds and indicated the similarities they used. In Study 3, 38 participants indicated their confidence in identifying the sounds. Participants reported using either acoustical similarities or similarities of the causes of the sounds. Experts used acoustical similarity more often than nonexperts, who used the similarity of the cause of the sounds. Sounds with a low causal uncertainty were more often grouped together because of the similarities of the cause, whereas sounds with a high causal uncertainty were grouped together more often because of the acoustical similarities. The same conclusions were reached for identification confidence. This measure allowed the sound classification to be predicted, and is a straightforward method to determine the appropriate description of a sound.
In this article we report on listener categorization of meaningful environmental sounds. A starting point for this study was the phenomenological taxonomy proposed by Gaver (1993b). In the first experimental study, 15 participants classified 60 environmental sounds and indicated the properties shared by the sounds in each class. In a second experimental study, 30 participants classified and described 56 sounds exclusively made by solid objects. The participants were required to concentrate on the actions causing the sounds independent of the sound source. The classifications were analyzed with a specific hierarchical cluster technique that accounted for possible cross-classifications, and the verbalizations were submitted to statistical lexical analyses. The results of the first study highlighted 4 main categories of sounds: solids, liquids, gases, and machines. The results of the second study indicated a distinction between discrete interactions (e.g., impacts) and continuous interactions (e.g., tearing) and suggested that actions and objects were not independent organizational principles. We propose a general structure of environmental sound categorization based on the sounds' temporal patterning, which has practical implications for the automatic classification of environmental sounds.
Imitative behaviors are widespread in humans, in particular whenever two persons communicate and interact. Several tokens of spoken languages (onomatopoeias, ideophones, and phonesthemes) also display different degrees of iconicity between the sound of a word and what it refers to. Thus, it probably comes at no surprise that human speakers use a lot of imitative vocalizations and gestures when they communicate about sounds, as sounds are notably difficult to describe. What is more surprising is that vocal imitations of non-vocal everyday sounds (e.g. the sound of a car passing by) are in practice very effective: listeners identify sounds better with vocal imitations than with verbal descriptions, despite the fact that vocal imitations are inaccurate reproductions of a sound created by a particular mechanical system (e.g. a car driving by) through a different system (the voice apparatus). The present study investigated the semantic representations evoked by vocal imitations of sounds by experimentally quantifying how well listeners could match sounds to category labels. The experiment used three different types of sounds: recordings of easily identifiable sounds (sounds of human actions and manufactured products), human vocal imitations, and computational “auditory sketches” (created by algorithmic computations). The results show that performance with the best vocal imitations was similar to the best auditory sketches for most categories of sounds, and even to the referent sounds themselves in some cases. More detailed analyses showed that the acoustic distance between a vocal imitation and a referent sound is not sufficient to account for such performance. Analyses suggested that instead of trying to reproduce the referent sound as accurately as vocally possible, vocal imitations focus on a few important features, which depend on each particular sound category. These results offer perspectives for understanding how human listeners store and access long-term sound representations, and sets the stage for the development of human-computer interfaces based on vocalizations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.