Christian Uhle scite author profile

We report on the tempo induction contest organized during the International Conference on Music Information Retrieval (ISMIR 2004) held at the University Pompeu Fabra in Barcelona, Spain, in October 2004. The goal of this contest was to evaluate some state-of-the-art algorithms in the task of inducing the basic tempo (as a scalar, in beats per minute) from musical audio signals. To our knowledge, this is the first published large scale cross-validation of audio tempo induction algorithms. Participants were invited to submit algorithms to the contest organizer, in one of several allowed formats. No training data was provided. A total of 12 entries (representing the work of seven research teams) were evaluated, 11 of which are reported in this document. Results on the test set of 3199 instances were returned to the participants before they were made public. Anssi Klapuri's algorithm won the contest. This evaluation shows that tempo induction algorithms can reach over 80% accuracy for music with a constant tempo, if we do not insist on finding a specific metrical level. After the competition, the algorithms and results were analyzed in order to discover general lessons for the future development of tempo induction systems. One conclusion is that robust tempo induction entails the processing of frame features rather than that of onset lists. Further, we propose a new "redundant" approach to tempo induction, inspired by knowledge of human perceptual mechanisms, which combines multiple simpler methods using a voting mechanism. Machine emulation of human tempo induction is still an open issue. Many avenues for future work in audio tempo tracking are highlighted, as for instance the definition of the best rhythmic features and the most appropriate periodicity detection method. In order to stimulate further research, the contest results, annotations, evaluation software and part of the data are available at http://ismir2004.ismir.net/ISMIR-Contest.html

show abstract

Source Separation for Enabling Dialogue Enhancement in Object-based Broadcast with MPEG-H

Paulus¹,

Torcoli²,

Uhle³

et al. 2019

J. Audio Eng. Soc.

View full text Add to dashboard Cite

Dialogue Enhancement (DE) is one of the most promising applications of user interactivity enabled by object-based audio broadcasting. DE allows personalization of the relative level of dialogue for intelligibility or aesthetic reasons. This paper discusses the implementation of DE in object-based audio transport with MPEG-H, with a special focus on source separation methods enabling DE also for legacy content without original objects available. The userbenefit of DE is assessed using the Adjustment/Satisfaction Test methodology. The test results demonstrate the need for an individually adjustable dialogue level because of highly-varying personal preferences. The test also investigates the subjective quality penalty from using source separation for obtaining the objects. The results show that even an imperfect separation result can successfully enable DE leading to increased end-user satisfaction.

show abstract

The Adjustment/Satisfaction Test (A/ST) for the Evaluation of Personalization in Broadcast Services and Its Application to Dialogue Enhancement

Torcoli

Herre

Fuchs

et al. 2018

IEEE Trans. on Broadcast.

View full text Add to dashboard Cite

Predicting the perceived level of late reverberation using computational models of loudness

Uhle

Paulus

Herre

2011

View full text Add to dashboard Cite

It is well known that the perceived level of reverberation depends on both the input audio signal and the impulse response. This work aims at quantifying this observation and predicting the perceived level of late reverberation based on separate signal paths of direct and reverberant signals, as they appear in digital audio effects. A basic approach to the problem is developed and subsequently extended by considering the impact of the reverberation time on the prediction result. This leads to a linear regression model with two input variables which is able to predict the perceived level with high accuracy, as shown on experimental data derived from listening tests. Variations of this model with different degrees of sophistication and computational complexity are compared regarding their accuracy. Applications include the control of digital audio effects for automatic mixing of audio signals

show abstract

Lebensqualität von älteren Menschen mit leichten kognitiven Störungen

et al. 2014

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Christian Uhle

An experimental comparison of audio tempo induction algorithms

Source Separation for Enabling Dialogue Enhancement in Object-based Broadcast with MPEG-H

The Adjustment/Satisfaction Test (A/ST) for the Evaluation of Personalization in Broadcast Services and Its Application to Dialogue Enhancement

Predicting the perceived level of late reverberation using computational models of loudness

Lebensqualität von älteren Menschen mit leichten kognitiven Störungen

Contact Info

Product

Resources

About