Data augmentation approaches for improving animal audio classification

Nanni, Loris; Maguolo, Gianluca; Paci, Michelangelo

doi:10.1016/j.ecoinf.2020.101084

Cited by 120 publications

(82 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…This idea was extended by introducing more policies such as vertical direction distortion (frequency warping), time length control, and loudness control ( Hwang et al., 2020 ). Other techniques used in image augmentation, e.g., rotation and mixture, are attempted as well ( Nanni et al., 2020 ).…”

Section: Methods To Integrate Human Knowledgementioning

confidence: 99%

Integrating Machine Learning with Human Knowledge

et al. 2020

View full text Add to dashboard Cite

Summary Machine learning has been heavily researched and widely used in many disciplines. However, achieving high accuracy requires a large amount of data that is sometimes difficult, expensive, or impractical to obtain. Integrating human knowledge into machine learning can significantly reduce data requirement, increase reliability and robustness of machine learning, and build explainable machine learning systems. This allows leveraging the vast amount of human knowledge and capability of machine learning to achieve functions and performance not available before and will facilitate the interaction between human beings and machine learning systems, making machine learning decisions understandable to humans. This paper gives an overview of the knowledge and its representations that can be integrated into machine learning and the methodology. We cover the fundamentals, current status, and recent progress of the methods, with a focus on popular and new topics. The perspectives on future directions are also discussed.

show abstract

Section: Methods To Integrate Human Knowledgementioning

confidence: 99%

Integrating Machine Learning with Human Knowledge

et al. 2020

View full text Add to dashboard Cite

show abstract

“…Recently, some methods have been proposed for audio data augmentation in emotional classification [44] and speech recognition [37], [45]. These data augmentation types are based on the spectrogram of the audio signals [46].…”

Section: B Data Augmentationmentioning

confidence: 99%

“…1-b) [43,48,49]. Time masking: t consecutive time steps are masked, replaced by a minimum intensity, at a random point along the spectrogram's time axis in the range of   00 t , t t  , where 0 t is chosen from   0, N t  randomly [45,46]. The vertical black strip with a bandwidth of t in Fig.…”

Section: Time Warpingmentioning

confidence: 99%

Automatic Personality Traits Perception Using Asymmetric Auto-Encoder

Zaferani¹,

Teshnehlab²,

Vali³

2021

IEEE Access

View full text Add to dashboard Cite

On account of an increase in the human-computer interface applications, the study of automatic personality perception has become more and more prevalent than speech signal processing in recent years. These studies have shown that personality traits derived from psychology theories mainly affect acoustic features. However, some obstacles remain in the automatic personality perception classification, and the most important one is to extract the features related to each personality trait. Previous studies have shown that the personality effect differs from one acoustic feature to the others. Additionally, there are many features one can extract from speech signals. Curse of dimensionality in features also makes the classification difficult. This paper aimed to introduce and examine a novel and efficient automatic feature extraction method to classify the well-known big five personality traits. In this regard, three data augmentation methods for increasing data samples were examined. Afterwards, 6,373 statistical features were extracted from the nonverbal features of the SSPNet Speaker Personality Corpus. Finally, an innovative stacked asymmetric auto-encoder was utilized to extract useful features automatically to improve classification results. Compared with the conventional stacked auto-encoder and convolutional neural network, the proposed method exhibited an average improvement of 12.40%(10.14%) and 14.36%(1.42%) in terms of the unweighted average recall (accuracy), respectively. In comparison with other published works, classification results also revealed a notable average enhancement (11.78%) for unweighted average recall for all five traits and an average improvement of 5.1% for accuracy in two out of five personality traits.

show abstract

“…A convolutional neural network (CNN) is a deep learning technology in which a data array of two or more dimensions, such as an image, is stacked through a plurality of two-dimensional filters. CNNs show high accuracies in image classification and have been recently applied in speech classification [ 25 , 26 , 27 ]. For animal sound classification using CNNs, Xie and Zhu [ 28 ] applied deep learning in classifying Australian bird sounds and reported a classification accuracy of more than 88%.…”

Section: Introductionmentioning

confidence: 99%

Deep Learning-Based Cattle Vocal Classification Model and Real-Time Livestock Monitoring System with Noise Filtering

Park

Kim

Moon

et al. 2021

Animals

View full text Add to dashboard Cite

The priority placed on animal welfare in the meat industry is increasing the importance of understanding livestock behavior. In this study, we developed a web-based monitoring and recording system based on artificial intelligence analysis for the classification of cattle sounds. The deep learning classification model of the system is a convolutional neural network (CNN) model that takes voice information converted to Mel-frequency cepstral coefficients (MFCCs) as input. The CNN model first achieved an accuracy of 91.38% in recognizing cattle sounds. Further, short-time Fourier transform-based noise filtering was applied to remove background noise, improving the classification model accuracy to 94.18%. Categorized cattle voices were then classified into four classes, and a total of 897 classification records were acquired for the classification model development. A final accuracy of 81.96% was obtained for the model. Our proposed web-based platform that provides information obtained from a total of 12 sound sensors provides cattle vocalization monitoring in real time, enabling farm owners to determine the status of their cattle.

show abstract

Data augmentation approaches for improving animal audio classification

Cited by 120 publications

References 41 publications

Integrating Machine Learning with Human Knowledge

Integrating Machine Learning with Human Knowledge

Automatic Personality Traits Perception Using Asymmetric Auto-Encoder

Deep Learning-Based Cattle Vocal Classification Model and Real-Time Livestock Monitoring System with Noise Filtering

Contact Info

Product

Resources

About