Deep convolutional network for animal sound classification and source attribution using dual audio recordings

Oikarinen, Tuomas; Srinivasan, Karthik; Meisner, Olivia; Hyman, Julia; Parmar, Shivangi; Fanucci-Kiss, Adrian; Desimone, Robert; Landman, Rogier; Feng, Guoping

doi:10.1121/1.5087827

Cited by 47 publications

(44 citation statements)

References 23 publications

(41 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Wave files were manually aligned type in Audacity ® (v. 2.1.0) software. The data was partially annotated by hand, and partially annotated using an auto-detection algorithm (20). Annotations included call start time, call end time, call type, and caller ID (animal A, animal B, or other).…”

Section: Methodsmentioning

confidence: 99%

Close range vocal interaction in the common marmoset (Callithrix Jacchus)

Landman

Sharma

Hyman

et al. 2019

Preprint

View full text Add to dashboard Cite

AbstractVocal communication in animals often involves taking turns vocalizing. In humans, turn taking is a fundamental rule in conversation. Among non-human primates, the common marmoset is known to engage in antiphonal calling using phee calls and trill calls. Calls of the trill type are the most common, yet difficult to study, because they are not very loud and uttered in conditions when animals are in close proximity to one another. Here we recorded trill calls in captive pair-housed marmosets using wearable microphones, while the animals were together with their partner or separated, but within trill call range. Trills were exchanged mainly with the partner and not with other animals in the room. Animals placed outside the home cage increased their trill call rate and uttered more trills in response more to their partner. The fundamental frequency, F0, of trills increased when animals were placed outside the cage. Our results indicate that trill calls can be monitored using wearable audio equipment. Relatively minor changes in social context affect trill call interactions and spectral properties of trill calls, indicating that marmosets can communicate subtle information to their partner vocally.

show abstract

Section: Methodsmentioning

confidence: 99%

Close range vocal interaction in the common marmoset (Callithrix Jacchus)

Landman

Sharma

Hyman

et al. 2019

Preprint

View full text Add to dashboard Cite

show abstract

“…In terms of animal voice classification, Zhang et al [39], Oikarinen et al [40] study animal voice classification using deep learning techniques. Our method is different from these two works.…”

Section: Related Workmentioning

confidence: 99%

“…Our studies focus on voice classification in noisy environment while the voice data in [39] are collected from controlled room without environmental noise. Instead of classifying different animals, [40] analyses different call types of marmoset monkeys such as Trill, Twitter, Phee and Chatter. Moreover, we implement the proposed system on a testbed and evaluate its performance in real world environment.…”

Section: Related Workmentioning

confidence: 99%

A multi-view CNN-based acoustic classification system for automatic animal species identification

Zhang

Yao

et al. 2020

Ad Hoc Networks

View full text Add to dashboard Cite

Automatic identification of animal species by their vocalization is an important and challenging task. Although many kinds of audio monitoring system have been proposed in the literature, they suffer from several disadvantages such as non-trivial feature selection, accuracy degradation because of environmental noise or intensive local computation. In this paper, we propose a deep learning based acoustic classification framework for Wireless Acoustic Sensor Network (WASN). The proposed framework is based on cloud architecture which relaxes the computational burden on the wireless sensor node. To improve the recognition accuracy, we design a multi-view Convolution Neural Network (CNN) to extract the short-, middle-, and long-term dependencies in parallel. The evaluation on two real datasets shows that the proposed architecture can achieve high accuracy and outperforms traditional classification systems significantly when the environmental noise dominate the audio signal (low SNR). Moreover, we implement and deploy the proposed system on a testbed and analyse the system performance in real-world environments. Both simulation and real-world evaluation demonstrate the accuracy and robustness of the proposed acoustic classification system in distinguishing species of animals.

show abstract

“…A convolutional neural network (CNN) is a deep learning technology in which a data array of two or more dimensions, such as an image, is stacked through a plurality of two-dimensional filters. CNNs show high accuracies in image classification and have been recently applied in speech classification [ 25 , 26 , 27 ]. For animal sound classification using CNNs, Xie and Zhu [ 28 ] applied deep learning in classifying Australian bird sounds and reported a classification accuracy of more than 88%.…”

Section: Introductionmentioning

confidence: 99%

Deep Learning-Based Cattle Vocal Classification Model and Real-Time Livestock Monitoring System with Noise Filtering

Park

Kim

Moon

et al. 2021

Animals

View full text Add to dashboard Cite

The priority placed on animal welfare in the meat industry is increasing the importance of understanding livestock behavior. In this study, we developed a web-based monitoring and recording system based on artificial intelligence analysis for the classification of cattle sounds. The deep learning classification model of the system is a convolutional neural network (CNN) model that takes voice information converted to Mel-frequency cepstral coefficients (MFCCs) as input. The CNN model first achieved an accuracy of 91.38% in recognizing cattle sounds. Further, short-time Fourier transform-based noise filtering was applied to remove background noise, improving the classification model accuracy to 94.18%. Categorized cattle voices were then classified into four classes, and a total of 897 classification records were acquired for the classification model development. A final accuracy of 81.96% was obtained for the model. Our proposed web-based platform that provides information obtained from a total of 12 sound sensors provides cattle vocalization monitoring in real time, enabling farm owners to determine the status of their cattle.

show abstract

Deep convolutional network for animal sound classification and source attribution using dual audio recordings

Cited by 47 publications

References 23 publications

Close range vocal interaction in the common marmoset (Callithrix Jacchus)

Close range vocal interaction in the common marmoset (Callithrix Jacchus)

A multi-view CNN-based acoustic classification system for automatic animal species identification

Deep Learning-Based Cattle Vocal Classification Model and Real-Time Livestock Monitoring System with Noise Filtering

Contact Info

Product

Resources

About