Conservationists are increasingly using autonomous acoustic recorders to determine the presence/absence and the abundance of bird species. Unlike humans, these recorders can be left in the field for extensive periods of time in any habitat. Although data acquisition is automated, manual processing of recordings is labour intensive, tedious, and prone to bias due to observer variations. Hence automated birdsong recognition is an efficient alternative. However, only few ecologists and conservationists utilise the existing birdsong recognisers to process unattended field recordings because the software calibration time is exceptionally high and requires considerable knowledge in signal processing and underlying systems, making the tools less user‐friendly. Even allowing for these difficulties, getting accurate results is exceedingly hard. In this review we examine the state‐of‐the‐art, summarising and discussing the methods currently available for each of the essential parts of a birdsong recogniser, and also available software. The key reasons behind poor automated recognition are that field recordings are very noisy, calls from birds that are a long way from the recorder can be faint or corrupted, and there are overlapping calls from many different birds. In addition, there can be large numbers of different species calling in one recording, and therefore the method has to scale to large numbers of species, or at least avoid misclassifying another species as one of particular interest. We found that these areas of importance, particularly the question of noise reduction, are amongst the least researched. In cases where accurate recognition of individual species is essential, such as in conservation work, we suggest that specialised (species‐specific) methods of passive acoustic monitoring are required. We also believe that it is important that comparable measures, and datasets, are used to enable methods to be compared.
The routine collection of long‐time acoustic recordings of animals in the field presents new challenges in data analysis. While many terabytes of data are collected annually, effective use of this noisy, highly variable data require skilled humans to manually identify calls. While computer programs to automatically analyse these recordings are becoming available, it is important that they are user‐friendly and easy‐to‐use, so that everybody – citizen scientists, wildlife managers, researchers – can take advantage of them, and that they keep the human in the loop so analyses carried out this year are comparable both to manual call counts from the past, and more accurate automated analyses performed in the future. We present the AviaNZ program, which is designed to achieve these goals: the software includes methods for simple, rapid manual annotation of recordings, denoising and segmentation methods, and a training procedure by which the user can prepare their own filters to automatically recognize individual species. The software can run in batch mode, automatically processing folders of field recordings, and then present the outputs to enable the quick and easy review of the results. Finally, the outputs are presented in a variety of spreadsheets to enable different statistical analyses to be performed. We describe the various workflows of manually and semi‐automatically processing sound files, annotating them to train automatic filters, using those filters in batch mode, and how the software facilitates rapid evaluation of the automated analysis. A demonstration of the software, comparing manual and automatic detection of calls of the little spotted kiwi Apteryx owenii is given. It shows that while the automatic detection does produce false positives, human correction of these is far faster than manual review of the whole sound file. AviaNZ is a freely available open‐source standalone program. Our experience shows that it can be used by anybody quickly and easily. However, for experienced users it is easily customizable and extendable. By enabling everybody involved with acoustic bird recording to quickly and easily analyse their own data, while future‐proofing it by keeping the human in the loop, we are enabling acoustic field recordings to meet their potential.
Automatic recording of birdsong is becoming the preferred way to monitor and quantify bird populations worldwide. Programmable recorders allow recordings to be obtained at all times of day and year for extended periods of time. Consequently, there is a critical need for robust automated birdsong recognition. One prominent obstacle to achieving this is low signal to noise ratio in unattended recordings. Field recordings are often very noisy: birdsong is only one component in a recording, which also includes noise from the environment (such as wind and rain), other animals (including insects), and human-related activities, as well as noise from the recorder itself. We describe a method of denoising using a combination of the wavelet packet decomposition and band-pass or low-pass filtering, and present experiments that demonstrate an order of magnitude improvement in noise reduction over natural noisy bird recordings.
Ecoacoustics has the potential to provide a large amount of information about the abundance of many animal species at a relatively low cost. Acoustic recording units are widely used in field data collection, but the facilities to reliably process the data recorded – recognizing calls that are relatively infrequent, and often significantly degraded by noise and distance to the microphone – are not well‐developed yet. We propose a call detection method for continuous field recordings that can be trained quickly and easily on new species, and degrades gracefully with increased noise or distance from the microphone. The method is based on the reconstruction of the sound from a subset of the wavelet nodes (elements in the wavelet packet decomposition tree). It is intended as a preprocessing filter, therefore we aim to minimize false negatives: false positives can be removed in subsequent processing, but missed calls will not be looked at again. We compare our method to standard call detection methods, and also to machine learning methods (using as input features either wavelet energies or Mel‐Frequency Cepstral Coefficients) on real‐world noisy field recordings of six bird species. The results show that our method has higher recall (proportion detected) than the alternative methods: 87% with 85% specificity on >53 hr of test data, resulting in an 80% reduction in the amount of data that needed further verification. It detected >60% of calls that were extremely faint (far away), even with high background noise. This preprocessing method is available in our AviaNZ bioacoustic analysis program and enables the user to significantly reduce the amount of subsequent processing required (whether manual or automatic) to analyse continuous field recordings collected by spatially and temporally large‐scale monitoring of animal species. It can be trained to recognize new species without difficulty, and if several species are sought simultaneously, filters can be run in parallel.
Autonomous recording units are now routinely used to monitor birdsong, starting to supplement and potentially replace human listening methods. However, to date there has been very little systematic comparison of human and machine detection ability. We present an experiment based on broadcast calls of nocturnal New Zealand birds in an area of natural forest. The soundscape was monitored by both novice and experienced humans performing a call count, and autonomous recording units. We match records of when calls were broadcast with detections by both humans and machines, and construct a hierarchical generalized linear model of the binary variable of correct detection or not, with a set of covariates about the call (distance, sound direction, relative altitude, and line of sight) and about the listener (age, experience, and gender). The results show that machines and humans have similar listening ability. Humans are more homogeneous in their recording of sounds, and this was not affected by their individual experience or characteristics. Humans were affected by trial and location, in particular one of the stations located in a small but deep valley. Despite recorders being affected significantly more than people by distance, altitude, and line of sight, their overall detection probability was higher. The specific location of recorders seems to be the most important factor determining what they record, and we suggest that for best results more than one recorder (or at least, microphone) is needed at each station to ensure all bird sounds of interest are captured.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.