“…In granular synthesis, microstructure arises from individual grains, and their rate of playback forms texture clouds at the level of mesostructure. Beyond the micro scale and spectrogram analysis are sound structures that emerge from complex spectral and temporal envelopes, such as sound textures and instrumental playing techniques [18].…”
Computer musicians refer to mesostructures as the intermediate levels of articulation between the microstructure of waveshapes and the macrostructure of musical forms. Examples of mesostructures include melody, arpeggios, syncopation, polyphonic grouping, and textural contrast. Despite their central role in musical expression, they have received limited attention in recent applications of deep learning to the analysis and synthesis of musical audio. Currently, autoencoders and neural audio synthesizers are only trained and evaluated at the scale of microstructure, i.e., local amplitude variations up to 100 ms or so. In this paper, the authors formulate and address the problem of mesostructural audio modeling via a composition of a differentiable arpeggiator and time-frequency scattering. The authors empirically demonstrate that time-frequency scattering serves as a differentiable model of similarity between synthesis parameters that govern mesostructure. By exposing the sensitivity of short-time spectral distances to time alignment, the authors motivate the need for a time-invariant and multiscale differentiable time-frequency model of similarity at the level of both local spectra and spectrotemporal modulations.
“…In granular synthesis, microstructure arises from individual grains, and their rate of playback forms texture clouds at the level of mesostructure. Beyond the micro scale and spectrogram analysis are sound structures that emerge from complex spectral and temporal envelopes, such as sound textures and instrumental playing techniques [18].…”
Computer musicians refer to mesostructures as the intermediate levels of articulation between the microstructure of waveshapes and the macrostructure of musical forms. Examples of mesostructures include melody, arpeggios, syncopation, polyphonic grouping, and textural contrast. Despite their central role in musical expression, they have received limited attention in recent applications of deep learning to the analysis and synthesis of musical audio. Currently, autoencoders and neural audio synthesizers are only trained and evaluated at the scale of microstructure, i.e., local amplitude variations up to 100 ms or so. In this paper, the authors formulate and address the problem of mesostructural audio modeling via a composition of a differentiable arpeggiator and time-frequency scattering. The authors empirically demonstrate that time-frequency scattering serves as a differentiable model of similarity between synthesis parameters that govern mesostructure. By exposing the sensitivity of short-time spectral distances to time alignment, the authors motivate the need for a time-invariant and multiscale differentiable time-frequency model of similarity at the level of both local spectra and spectrotemporal modulations.
“…Amplitude spectrum envelope (ASE) [5], [6], [7] Constant Q-Transform(QT) [8], [9] Crest factor (CF) [10] Discrete wavelet transform (DWT) [11] Daubechies wavelet coefficient histogram (DWCH) [12], [13], [14], [15], [16], [17] Fourier cepstrum coefficient (FCC) [18], [19] Linear predictive cepstral coefficients (LPCC) [21], [22] Mel-frequency cepstral coefficient (MFCC) [19], [20], [22], [26] Morlet wavelet transforms (MWT) [32], [33], [34], [35] Octave scale cepstral coefficient (OSCC) [6], [14], [24], [29] Root mean square energy (RMS) [30], [31] In the modern era, the music industry has made extensive use of multiple sound channels during the music recording and production stages, most of them are stereo sounds. In current music classification systems, stereo information is neglected and only the mixed sound signals are considered by adding two channels.…”
In recent years, the revenue earned through digital music stood at a billion-dollar market and the US remained the most profitable market for Digital music. Due to the digital shift, today people have access to millions of music clips from online music applications through their smart phones. In this context, there are some issues identified between the music listeners, music search engine by querying and retrieving music clips from a large collection of music data set. Classification is one of the fundamental problems in music information retrieval (MIR). Still, there are some hurdles according to their listener's preferences regarding music collections and their categorization. In this paper, different music extraction features are addressed, which can be used in various tasks related to music classification like a listener's mood, instrument recognition, artist identification, genre, query-by-humming, and music annotation. This review illustrates various features that can be used for addressing the research challenges posed by music mining.
“…Knowledge of sound analysis and synthesis is important for understanding and manipulating the guitar's sound [7,8]. The guitar produces periodic sounds from the plucking of its strings, which can be described using Fourier series [9,10]. Fourier analysis can be used to decompose the guitar sound into its component frequencies [8].…”
This research aims to analyze and synthesize periodic signals derived from guitar string plucking with hammer-on technique. This research has three stages, namely the data collection stage, the analysis stage, and the synthesis stage. The guitar string was plucked with a tension of 2.5 N and recorded using a sound sensor connected to PASCO Capstone software. The data used has two variations, namely the sound signal of a hammer-on pluck with a half tone increase and a one tone increase. Data analysis was carried out using MATLAB software to obtain deviation graphs as a function of frequency, damping coefficient values, and frequency spectra. The results showed that after hammer-on the amplitude of the tone decreased drastically as the mass per unit length of the string decreased. The initial tone before the hammer-on will appear in the tone after the hammer-on with a lower amplitude as the mass per unit length of the string increases. The synthesis of guitar sounds with this technique is done by combining the individual tones obtained and adjusting the time interval and amplitude according to the literature data
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.