The goal of this project is to use acoustic signatures to detect, classify, and count the calls of four acoustic populations of blue whales so that, ultimately, the conservation status of each population can be better assessed. We used manual annotations from 350 h of audio recordings from the underwater hydrophones in the Indian Ocean to build a deep learning model to detect, classify, and count the calls from four acoustic song types. The method we used was Siamese neural networks (SNN), a class of neural network architectures that are used to find the similarity of the inputs by comparing their feature vectors, finding that they outperformed the more widely used convolutional neural networks (CNN). Specifically, the SNN outperform a CNN with 2% accuracy improvement in population classification and 1.7%-6.4% accuracy improvement in call count estimation for each blue whale population. In addition, even though we treat the call count estimation problem as a classification task and encode the number of calls in each spectrogram as a categorical variable, SNN surprisingly learned the ordinal relationship among them. SNN are robust and are shown here to be an effective way to automatically mine large acoustic datasets for blue whale calls. V
The most common approach to monitor mysticete acoustic presence is to detect and count their calls in audio records. To implement this method on large datasets, polyvalent and robust automated call detectors are required. Evaluating their performance is essential, to design a detection strategy adapted to study the available datasets. This assessment then enables accurate post-analyses and comparisons of multiple independent surveys. In this paper, we present the performance of a detector based on dictionaries and sparse representation of the signal to detect blue whale stereotyped and non-stereotyped vocalizations (D-calls) in a larg acoustic database with multiple sites and years of recordings in the southern Indian Ocean. Results show that recall increases with the SNR (Sound to Noise Ratio) and reaches 90% for positive SNR stereotyped calls and between 80% and 90% for high SNR D-calls. A detailed analysis of the influence of dictionary composition, SNR of the calls, manual ground truth as well as interference types and abundance, on the performance variability is presented. Eventually, a detection strategy for long term acoustic monitoring is defined.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.