Deep Learning and Domain Transfer for Orca Vocalization Detection

Best, Paul; Ferrari, Maxence; Poupard, Marion; Paris, Sébastien; Marxer, Ricard; Symonds, Helena; Spong, Paul; Glotin, Hervé

doi:10.1109/ijcnn48605.2020.9207567

Cited by 7 publications

(5 citation statements)

References 13 publications

(12 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To accomplish this, we analyzed three different natural scenes or observations. Previous shore-based fixed hydrophone systems enabled the detection and tracking of orcas from the same population (Grebner, 2009;Bergler et al, 2019;Poupard et al, 2019a;Best et al, 2020) but did not succeed in associating individuals with calls or determine precise individual pattern variations of communication between groups. Tracking of a few individual orcas in another population, without visual checking, has been realized in a complex pilot study using 14 hydrophones deployed in three compact arrays (Gassmann et al, 2013).…”

Section: Discussionmentioning

confidence: 99%

Intra-Group Orca Call Rate Modulation Estimation Using Compact Four Hydrophones Array

Poupard

Symonds²,

Spong³

et al. 2021

Front. Mar. Sci.

Self Cite

View full text Add to dashboard Cite

Acoustic emissions are vital for orca (Orcinus orca) socializing, hunting, and maintaing spatial awareness. Studying the acoustic emissions of orcas on an individual basis often results in interference with their natural behaviors through mounting tags or following by boat. In order to analyze their inter- and intra-group communication, we propose a study allowing us to associate vocalizations with their emitter (matriline and when possible individual). Such a non-interfering device for allocating calls to individual orcas could substantially boost our understanding of their complex acoustic world. Our experimental protocol was based on a compact array of four hydrophones fixed near the shore, operable up to 1 km away from the path of orcas. It was used during summer 2019 at the research station OrcaLab, northern Vancouver Island, Canada. A total of 722 calls were extracted, jointly with visual identification and azimuth of surfacing orcas, allowing validation of the acoustic diarization and azimuth estimations of the orca calls. We then calculated the Call Rate (CR) for each matriline or when possible individual in order to describe their acoustic activity. Preliminary results show that CR could be modulated according to the distance of the signaler from a group, the presence of another group, or anthropic pressure.

show abstract

Section: Discussionmentioning

confidence: 99%

Intra-Group Orca Call Rate Modulation Estimation Using Compact Four Hydrophones Array

Poupard

Symonds²,

Spong³

et al. 2021

Front. Mar. Sci.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Our shore based study looked at orca communication in the wild at a unique level of precision. Other shore based fixed hydrophone systems enabled the detection and tracking of orcas from the same population [36][37][38] but did not succeed in associating individuals with calls, or determine precise individual pattern variations of communication between groups. This work does not pretend to provide all explanations for the type of calls, but rather the CR.…”

Section: Discussionmentioning

confidence: 99%

Evidences of Intra-Group Orca Call Rate Modulation Using A Small-Aperture Four Hydrophone Array

Poupard¹,

Symonds²,

Spong³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Acoustic emissions are vital to orcas (Orcinus orca) to socialize, hunt, orient, and maintain spatial awareness. In order to better analyze their inter and intra-group communication, we propose a novel protocol that allows us to associate vocalizations with their emitter (individual/matriline). Our approach is based on a low cost small-aperture four hydrophone array fixed near the shore up to a few km away from the orcas’ path, operated in conjunction with visual identification. It was conducted in the summer of 2019 off northern Vancouver Island, Canada, at the research station OrcaLab. A total of 722 calls were extracted and localized in azimuth via the hydrophone array from 3 case studies in which different events took place.We then calculated the Call Rate (CR) for each individual/matriline in order to describe their acoustic activity. Results show that CR is modulated according to the distance of the signaler from the joint group, the presence of another group, and the anthropic pressure (nearby cruise ship). This shows evidence of intertwined calls. This protocol does not interfere with the animals and opens new perspectives towards inter and intra-group communication analysis.

show abstract

“…Reshape embedding (9,8192) functions -i. e., multi-head attention -in the context of audio MIL classification. Assuming K attention heads, the aggregated bag-level embedding per head is calculated as follows:…”

Section: Sequence Poolingmentioning

confidence: 99%

“…Automated methods for recording and analysing bioacoustic data hold the promise for unprecedented scalability in wildlife monitoring, with the purpose of preservation through a global biodiversity crisis [1]. This has enabled biologists and engineers to perform machine learning studies on bioacoustics across a large taxonomic range, such as primates [2,3] or other terrestrial [4,5] or marine mammals [6,7,8,9,10], birds [11,12,13,14,15], as well as amphibians [14], in applications like call detection for verifying presence or estimating density [6,2,4], discerning between calls of different species [14,15], as well as different call types of a particular animal [5,8].…”

Section: Introductionmentioning

confidence: 99%

Multi-Attentive Detection of the Spider Monkey Whinny in the (Actual) Wild

Rizos

Lawson²,

Han³

et al. 2021

Interspeech 2021

View full text Add to dashboard Cite

We study deep bioacoustic event detection through multi-head attention based pooling, exemplified by wildlife monitoring. In the multiple instance learning framework, a core deep neural network learns a projection of the input acoustic signal into a sequence of embeddings, each representing a segment of the input. Sequence pooling is then required to aggregate the information present in the sequence such that we have a single clip-wise representation. We propose an improvement based on Squeeze-and-Excitation mechanisms upon a recently proposed audio tagging ResNet, and show that it performs significantly better than the baseline, as well as a collection of other recent audio models. We then further enhance our model, by performing an extensive comparative study of recent sequence pooling mechanisms, and achieve our best result using multi-head selfattention followed by concatenation of the head-specific pooled embeddings -better than prediction pooling methods, as well as compared to other recent sequence pooling tricks. We perform these experiments on a novel dataset of spider monkey whinny calls we introduce here, recorded in a rainforest in the South-Pacific coast of Costa Rica, with a promising outlook pertaining to minimally invasive wildlife monitoring.

show abstract

Deep Learning and Domain Transfer for Orca Vocalization Detection

Cited by 7 publications

References 13 publications

Intra-Group Orca Call Rate Modulation Estimation Using Compact Four Hydrophones Array

Intra-Group Orca Call Rate Modulation Estimation Using Compact Four Hydrophones Array

Evidences of Intra-Group Orca Call Rate Modulation Using A Small-Aperture Four Hydrophone Array

Multi-Attentive Detection of the Spider Monkey Whinny in the (Actual) Wild

Contact Info

Product

Resources

About