Data Augmentation and Deep Learning Methods in Sound Classification: A Systematic Review

Abayomi‐Alli, Olusola; Damaševičius, Robertas; Qazi, Atika; Adedoyin-Olowe, Mariam; Misra, Sanjay

doi:10.3390/electronics11223795

Cited by 40 publications

(18 citation statements)

References 125 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In machine learning-based processing, this is carried out by incrementing training data. A standard solution is to artificially increase the quantity of training data patterns by transforming the available speech patterns by adding noise, time warping and shifting, pitch shifting, time or frequency masking, or filtering [ 58 , 59 , 60 ].…”

Section: Experiments and Results Analysismentioning

confidence: 99%

Detecting Lombard Speech Using Deep Learning Approach

Kąkol¹,

Korvel

Tamulevičius

et al. 2022

Sensors

View full text Add to dashboard Cite

Robust Lombard speech-in-noise detecting is challenging. This study proposes a strategy to detect Lombard speech using a machine learning approach for applications such as public address systems that work in near real time. The paper starts with the background concerning the Lombard effect. Then, assumptions of the work performed for Lombard speech detection are outlined. The framework proposed combines convolutional neural networks (CNNs) and various two-dimensional (2D) speech signal representations. To reduce the computational cost and not resign from the 2D representation-based approach, a strategy for threshold-based averaging of the Lombard effect detection results is introduced. The pseudocode of the averaging process is also included. A series of experiments are performed to determine the most effective network structure and the 2D speech signal representation. Investigations are carried out on German and Polish recordings containing Lombard speech. All 2D signal speech representations are tested with and without augmentation. Augmentation means using the alpha channel to store additional data: gender of the speaker, F0 frequency, and first two MFCCs. The experimental results show that Lombard and neutral speech recordings can clearly be discerned, which is done with high detection accuracy. It is also demonstrated that the proposed speech detection process is capable of working in near real-time. These are the key contributions of this work.

show abstract

Section: Experiments and Results Analysismentioning

confidence: 99%

Detecting Lombard Speech Using Deep Learning Approach

Kąkol¹,

Korvel

Tamulevičius

et al. 2022

Sensors

View full text Add to dashboard Cite

show abstract

“…Once the clip length was fixed, we set the frame duration to 1 s, considering the standard frame size in YAMNet input, and ensured that adjacent frames had a 50% overlap. Through experiments, we concluded that 3 s is an appropriate duration [ 22 ].…”

Section: Discussionmentioning

confidence: 99%

Sound-Event Detection of Water-Usage Activities Using Transfer Learning

Hyun

2023

Sensors

View full text Add to dashboard Cite

In this paper, a sound event detection method is proposed for estimating three types of bathroom activities—showering, flushing, and faucet usage—based on the sounds of water usage in the bathroom. The proposed approach has a two-stage structure. First, the general sound classification network YAMNet is utilized to determine the existence of a general water sound; if the input data contains water sounds, W-YAMNet, a modified network of YAMNet, is then triggered to identify the specific activity. W-YAMNet is designed to accommodate the acoustic characteristics of each bathroom. In training W-YAMNet, the transfer learning method is applied to utilize the advantages of YAMNet and to address its limitations. Various parameters, including the length of the audio clip, were experimentally analyzed to identify trends and suitable values. The proposed method is implemented in a Raspberry-Pi-based edge computer to ensure privacy protection. Applying this methodology to 10-min segments of continuous audio data yielded promising results. However, the accuracy could still be further enhanced, and the potential for utilizing the data obtained through this approach in assessing the health and safety of elderly individuals living alone remains a topic for future investigation.

show abstract

“…It outperforms traditional classification methods in handling real-world industrial mechanical sound data, thereby contributing to reduced maintenance costs, enhanced safety in processing, improved equipment availability, and reduced production downtime costs while maintaining acceptable performance levels. However, this deep learning method requires extensive data when dealing with complex audio signals and industrial noise, or its performance may be compromised [44].…”

Section: Sound Sensorsmentioning

confidence: 99%

Robotics Perception and Control: Key Technologies and Applications

Luo,

Zhou,

Zeng

et al. 2024

Micromachines

View full text Add to dashboard Cite

The integration of advanced sensor technologies has significantly propelled the dynamic development of robotics, thus inaugurating a new era in automation and artificial intelligence. Given the rapid advancements in robotics technology, its core area—robot control technology—has attracted increasing attention. Notably, sensors and sensor fusion technologies, which are considered essential for enhancing robot control technologies, have been widely and successfully applied in the field of robotics. Therefore, the integration of sensors and sensor fusion techniques with robot control technologies, which enables adaptation to various tasks in new situations, is emerging as a promising approach. This review seeks to delineate how sensors and sensor fusion technologies are combined with robot control technologies. It presents nine types of sensors used in robot control, discusses representative control methods, and summarizes their applications across various domains. Finally, this survey discusses existing challenges and potential future directions.

show abstract

Data Augmentation and Deep Learning Methods in Sound Classification: A Systematic Review

Cited by 40 publications

References 125 publications

Detecting Lombard Speech Using Deep Learning Approach

Detecting Lombard Speech Using Deep Learning Approach

Sound-Event Detection of Water-Usage Activities Using Transfer Learning

Robotics Perception and Control: Key Technologies and Applications

Contact Info

Product

Resources

About