2022
DOI: 10.3390/electronics11223795
|View full text |Cite
|
Sign up to set email alerts
|

Data Augmentation and Deep Learning Methods in Sound Classification: A Systematic Review

Abstract: The aim of this systematic literature review (SLR) is to identify and critically evaluate current research advancements with respect to small data and the use of data augmentation methods to increase the amount of data available for deep learning classifiers for sound (including voice, speech, and related audio signals) classification. Methodology: This SLR was carried out based on the standard SLR guidelines based on PRISMA, and three bibliographic databases were examined, namely, Web of Science, SCOPUS, and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 40 publications
(18 citation statements)
references
References 125 publications
0
6
0
Order By: Relevance
“…In machine learning-based processing, this is carried out by incrementing training data. A standard solution is to artificially increase the quantity of training data patterns by transforming the available speech patterns by adding noise, time warping and shifting, pitch shifting, time or frequency masking, or filtering [ 58 , 59 , 60 ].…”
Section: Experiments and Results Analysismentioning
confidence: 99%
“…In machine learning-based processing, this is carried out by incrementing training data. A standard solution is to artificially increase the quantity of training data patterns by transforming the available speech patterns by adding noise, time warping and shifting, pitch shifting, time or frequency masking, or filtering [ 58 , 59 , 60 ].…”
Section: Experiments and Results Analysismentioning
confidence: 99%
“…Once the clip length was fixed, we set the frame duration to 1 s, considering the standard frame size in YAMNet input, and ensured that adjacent frames had a 50% overlap. Through experiments, we concluded that 3 s is an appropriate duration [ 22 ].…”
Section: Discussionmentioning
confidence: 99%
“…It outperforms traditional classification methods in handling real-world industrial mechanical sound data, thereby contributing to reduced maintenance costs, enhanced safety in processing, improved equipment availability, and reduced production downtime costs while maintaining acceptable performance levels. However, this deep learning method requires extensive data when dealing with complex audio signals and industrial noise, or its performance may be compromised [44].…”
Section: Sound Sensorsmentioning
confidence: 99%