Time Scale Modification (TSM) is a well-researched field; however, no effective objective measure of quality exists. This paper details the creation, subjective evaluation, and analysis of a dataset for use in the development of an objective measure of quality for TSM. Comprised of two parts, the training component contains 88 source files processed using six TSM methods at 10 time scales, while the testing component contains 20 source files processed using three additional methods at four time scales. The source material contains speech, solo harmonic and percussive instruments, sound effects, and a range of music genres. Ratings (42 529) were collected from 633 sessions using laboratory and remote collection methods. Analysis of results shows no correlation between age and quality of rating; expert and non-expert listeners to be equivalent; minor differences between participants with and without hearing issues; and minimal differences between testing modalities. A comparison of published objective measures and subjective scores shows the objective measures to be poor indicators of subjective quality. Initial results for a retrained objective measure of quality are presented with results approaching average root mean squared error loss and Pearson correlation values of subjective sessions. The labeled dataset is available at http://ieee-dataport.org/1987.
A modification to the Epoch-Synchronous Overlap-Add (ESOLA) TimeScale Modification (TSM) algorithm is proposed in this paper. The proposed method, Fuzzy Epoch-Synchronous Overlap-Add, improves on the previous ESOLA method through the use of cross-correlation to align time-smeared epochs before overlapadding. This reduces distortion and artefacts while the speaker's fundamental frequency is stable, as well as reducing artefacts during pitch modulation. The proposed method is tested against well known TSM algorithms. It is preferred over ESOLA, and gives similar performance to other TSM algorithms for voice signals. It is also shown that this algorithm can work effectively with solo instrument signals containing strong fundamental frequencies. Full implementation of the proposed method and zero frequency resonator can be found at github.com/zygurt/TSM.
The phase relationship between channels should be maintained when processing multiple channel signals with TimeScale Modification (TSM). This paper proposes a method and additional variant for maintaining the phase relationship between channels, and retaining the presence in the centre of the stereo signal as a result. The method involves pre-and postprocessing the file with the variant processing each frame for real-time suitability. Sum and difference transforms of the stereo signal are used for timescale modification and results in a large improvement in stereo phase coherence as well as maintaining the stereo field. The proposed method produces a high quality stereo output and greatly improves quality over the independent channel processing method. It also allows for simple implementation, and can be implemented around existing TSM frameworks. The proposed method and variant are suitable for both frequency and time domain TSM methods. Availability: All source code, figures, and source audio files can be found at github.com/zygurt/TSM/.
Objective evaluation of audio processed with time-scale modification (TSM) remains an open problem. Recently, a dataset of time-scaled audio with subjective quality labels was published and used to create an initial objective measure of quality (OMOQ). In this paper, an improved OMOQ for time-scaled audio is proposed. The measure uses handcrafted features and a fully connected network to predict subjective mean opinion scores (SMOS). Basic and advanced perceptual evaluation of audio quality features are used in addition to nine features specific to TSM artefacts. Six methods of alignment are explored with interpolation of the reference magnitude spectrum to the length of the test magnitude spectrum giving the best performance. The proposed measure achieves a mean root mean square error of 0.490 and a mean Pearson correlation of 0.864 to SMOS, equivalent to the 97th and 82nd percentiles of the subjective sessions, respectively. The proposed measure is used to evaluate TSM algorithms, finding that Elastique gives the highest objective quality for solo instrument and voice signals, whereas the identity phase-locking phase vocoder gives the highest objective quality for music signals and the best overall quality. The objective measure is available online at https://www.github.com/zygurt/TSM.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.