Ecoacoustics needs sophisticated acoustic monitoring tools to extract a wide level of features from an observed mixture of sounds. We have developed a portable acoustic monitoring system called 'HARKBird' which consists of a laptop PC and an inexpensive commercial microphone array with the robot audition software HARK. HARKBird can extract acoustic events in a recording, and we can obtain the begin and end timings, the spatial information (e.g., position or direction from the microphone array), and the spectrogram of the sound separated from the original recording. In this study, we report how robot audition techniques contribute to monitoring spatio-spectro-temporal dynamics of bird behaviors, using an extended and minimal system based on multiple microphone arrays. The dimension reduction of separated sounds is important to integrate the information from multiple microphone arrays. As a dimension reduction algorithm, we use t-SNE to help manual annotation of each sound and to generate the vocalization distribution automatically. We conduct playback experiments to Spotted Towhee (Pipilo maculatus) to simulate different cases of territorial intrusions (song/call/no playback). Our hypothesis in playback experiments is that playback of conspecific vocalizations would invoke aggressive responses of males against song playbacks and the effects would be more prominent than those of call playbacks. Our primary aim is to test whether our system can extract the necessary information on the aggressiveness of target individuals to examine our hypothesis. We show the system with manual annotation of vocalizations can extract their different spatio-spectro-temporal dynamics in different conditions, which supported our hypothesis. We also consider the spectral affinity-based automatic matching of localized sounds from different microphone arrays. The relative number of localized songs depending on the playback conditions reflected a similar trend to those in the manual approach, implying that we can grasp the long-term dynamics of vocalizations without costly annotations.
We report on a simple and practical application of HARK, an easily available and portable system for bird song localization using an open-source software for robot audition HARK, to a deeper understanding of ecoacoustic dynamics of bird songs, focusing on a fine-scaled temporal analysis of song movement — song type dynamics in playback experiments. We extended HARKBird and constructed a system that enables us to conduct automatic playback and interactive experiments with different conditions, with a real-time recording and localization of sound sources. We investigate how playback of conspecific songs and playback patterns can affect vocalization of two types of songs and spatial movement of an individual of Japanese bush-warbler, showing quantitatively that there exist strong relationships between song type and spatial movement. We also simulated the ecoacoustic dynamics of the singing behavior of the focal individual using a software, termed Bird song explorer, which provides users a virtual experience of acoustic dynamics of bird songs using a 3D game platform Unity. Based on experimental results, we discuss how our approach can contribute to ecoacoustics in terms of two different roles of sounds: sounds as tools and subjects.
To understand the social interactions among songbirds, extracting the timing, position, and acoustic properties of their vocalizations is essential. We propose a framework for automatic and fine-scale extraction of spatial-spectral-temporal patterns of bird vocalizations in a densely populated environment. For this purpose, we used robot audition techniques to integrate information (i.e., the timing, direction of arrival, and separated sound of localized sources) from multiple microphone arrays (array of arrays) deployed in an environment, which is non-invasive. As a proof of concept of this framework, we examined the ability of the method to extract active vocalizations of multiple Zebra Finches in an outdoor mesh tent as a realistic situation in which they could fly and vocalize freely. We found that localization results of vocalizations reflected the arrangements of landmark spots in the environment such as nests or perches and some vocalizations were localized at non-landmark positions. We also classified their vocalizations as either songs or calls by using a simple method based on the tempo and length of the separated sounds, as an example of the use of the information obtained from the framework. Our proposed approach has great potential to understand their social interactions and the semantics or functions of their vocalizations considering the spatial relationships, although detailed understanding of the interaction would require analysis of more long-term recordings.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.