Abstract-In this paper we propose the integration of an online audio beat tracking system into the general framework of robot audition, to enable its application in musically-interactive robotic scenarios. To this purpose, we introduced a staterecovery mechanism into our beat tracking algorithm, for handling continuous musical stimuli, and applied different multi-channel preprocessing algorithms (e.g., beamforming, ego noise suppression) to enhance noisy auditory signals lively captured in a real environment. We assessed and compared the robustness of our audio beat tracker through a set of experimental setups, under different live acoustic conditions of incremental complexity. These included the presence of continuous musical stimuli, built of a set of concatenated musical pieces; the presence of noises of different natures (e.g., robot motion, speech); and the simultaneous processing of different audio sources on-the-fly, for music and speech. We successfully tackled all these challenging acoustic conditions and improved the beat tracking accuracy and reaction time to music transitions while simultaneously achieving robust automatic speech recognition.
This paper presents the design and implementation of a real-time real-world beat tracking system which runs on a dancing robot. The main problem of such a robot is that, while it is moving, ego noise is generated due to its motors, and this directly degrades the quality of the audio signal features used for beat tracking. Therefore, we propose to incorporate ego noise reduction as a pre-processing stage prior to our tempo induction and beat tracking system. The beat tracking algorithm is based on an online strategy of competing agents sequentially processing a continuous musical input, while considering parallel hypotheses regarding tempo and beats. This system is applied to a humanoid robot processing the audio from its embedded microphones on-thefly, while performing simplistic dancing motions. A detailed and multi-criteria based evaluation of the system across different music genres and varying stationary/non-stationary noise conditions is presented. It shows improved performance and noise robustness, outperforming our conventional beat tracker (i.e., without ego noise suppression) by 15.2 points in tempo estimation and 15.0 points in beat-times prediction.
In this paper we propose an audio beat tracking system, IBT, for multiple applications. The proposed system integrates an automatic monitoring and state recovery mechanism, that applies (re-)inductions of tempo and beats, on a multi-agent-based beat tracking architecture. This system sequentially processes a continuous onset detection function while propagating parallel hypotheses of tempo and beats. Beats can be predicted in a causal or in a non-causal usage mode, which makes the system suitable for diverse applications. We evaluate the performance of the system in both modes on two application scenarios: standard (using a relatively large database of audio clips) and streaming (using long audio streams made up of concatenated clips). We show experimental evidence of the usefulness of the automatic monitoring and state recovery mechanism in the streaming scenario (i.e., improvements in beat tracking accuracy and reaction time). We also show that the system performs efficiently and at a level comparable to state-of-the-art algorithms in the standard scenario. IBT is multi-platform, open-source and freely available, and it includes plugins for different popular audio analysis, synthesis and visualization platforms.
Expressiveness and naturalness in robotic motions and behaviors can be replicated with the usage of captured human movements. Considering dance as a complex and expressive type of motion, in this paper we propose a method for generating humanoid dance motions transferred from human motion capture (MoCap) data. Motion data of samba dance was synchronized to samba music, manually annotated by experts, in order to build a spatiotemporal representation of the dance movement with variability, in relation to the respective musical temporal structure (musical meter). This enabled the determination and generation of variable dance key-poses according to the captured human body model. In order to retarget these key-poses from the original human model into the considered humanoid morphology, we propose methods for resizing and adapting the original trajectories to the robot joints, overcoming its varied kinematic constraints. Finally, a method for generating the angles for each robot joint is presented, enabling the reproduction of the desired poses in a simulated humanoid robot NAO. The achieved results validated our approach, suggesting that our method can generate poses from motion capture and reproduce them on a humanoid robot with a good degree of similarity.
Abstract-In this paper we propose a general active audition framework for auditory-driven Human-Robot Interaction (HRI). The proposed framework simultaneously processes speech and music on-the-fly, integrates perceptual models for robot audition, and supports verbal and non-verbal interactive communication by means of (pro)active behaviors. To ensure a reliable interaction, on top of the framework a behavior decision mechanism based on active audition policies the robot's actions according to the reliability of the acoustic signals for auditory processing. To validate the framework's application to general auditory-driven HRI, we propose the implementation of an interactive robot dancing system. This system integrates three preprocessing robot audition modules: sound source localization, sound source separation, and ego noise suppression; two modules for auditory perception: live audio beat tracking and automatic speech recognition; and multi-modal behaviors for verbal and nonverbal interaction: music-driven dancing and speech-driven dialoguing. To fully assess the system, we set up experimental and interactive real-world scenarios with highly dynamic acoustic conditions, and defined a set of evaluation criteria. The experimental tests revealed accurate and robust beat tracking and speech recognition, and convincing dance beat-synchrony. The interactive sessions confirmed the fundamental role of the behavior decision mechanism for actively maintaining a robust and natural human-robot interaction.
In this paper, an approach is presented that identifies music samples which are difficult for current state-of-the-art beat trackers. In order to estimate this difficulty even for examples without ground truth, a method motivated by selective sampling is applied. This method assigns a degree of difficulty to a sample based on the mutual disagreement between the output of various beat tracking systems. On a large beat annotated dataset we show that this mutual agreement is correlated with the mean performance of the beat trackers evaluated against the ground truth, and hence can be used to identify difficult examples by predicting poor beat tracking performance. Towards the aim of advancing future beat tracking systems, we demonstrate how our method can be used to form new datasets containing a high proportion of challenging music examples.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.