Robust Audio Content Classification Using Hybrid-Based SMD and Entropy-Based VAD

Wang, Kun-Ching

doi:10.3390/e22020183

Cited by 7 publications

(4 citation statements)

References 68 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Nanni suggested a general architecture for context-aware recommendation system [ 17 ]. Wang suggested introducing wavelet transform into feature engineering and using neural network model in classification method [ 18 ]. López proposed a new language model and suggested the usefulness of clustering using tags and audio content.…”

Section: Related Workmentioning

confidence: 99%

[Retracted] Music Classification Method Using Big Data Feature Extraction and Neural Networks

2022

Journal of Environmental and Public Health

View full text Add to dashboard Cite

From the cassette era to the CD era to the digital music era, the quantity of music has grown rapidly. People cannot easily search for the desired music without classifying enormous music resources and developing a successful music retrieval system. By examining users’ historical listening patterns for personalised recommendations, the music recommendation algorithm can lessen message fatigue for users and enhance user experience. Relying on manual labelling is how traditional music is classified. It would be inefficient and unrealistic to attempt to classify music using manual labelling in the age of big data. Feature extraction and neural networks are the tools employed in this paper. The model’s parameters can be trained using conventional gradient descent techniques, and the model’s trained convolution neural network can learn the image’s features and finish the extraction and classification of the features. This algorithm is 12 percent superior to the conventional algorithm, according to the research in this paper. It has strong ability and is appropriate for widespread implementation with the same number of iterations.

show abstract

Section: Related Workmentioning

confidence: 99%

[Retracted] Music Classification Method Using Big Data Feature Extraction and Neural Networks

2022

Journal of Environmental and Public Health

View full text Add to dashboard Cite

show abstract

“…Thus, segmentation improves classification. A general audio classification scheme to segment an arbitrary audio clip is presented in [1]. It achieves good accuracy rate of 96%.…”

Section: Literature Surveymentioning

confidence: 99%

“…Hence, are considered for experimentation. Research by Wang [1], music is separated in three categories. First is popular music domain.…”

Section: Literature Surveymentioning

confidence: 99%

An algorithm for enhancement of audio content classification

Bang¹,

Purandare²,

Ratnaparkhi³

2023

Bulletin EEI

View full text Add to dashboard Cite

Presently, fast proliferation of information enforces novel challenges on content management. Further, computerized audio classification along-with content description is considered as valuable method to manage audio contents. In general, classification involves two steps. First, is the processing of accessible data in economical ways to deliver explanatory features. Second is how accurate features of undetermined tests is evaluated to choose classifier. In this paper, k-neighbor algorithm with machine learning is proposed for feature extraction as well as content classification/description. This algorithm enhances Quality of Service parameters of classifiers. Here, development of training as well as testing data set is developed to increase the classifier accuracy. A test engine set-up bed using simulation tool MATLAB is designed to estimate the implementation performance of the algorithm. A range of features are studied to evaluate effectiveness in terms of accuracy, zero crossing rate (ZCR) and spectral roll frequency. From the experimentation results, it is observed that the proposed algorithm can achieve accuracy of 95.8% for 2 sec window length as compare with k-neighbor algorithm. A total enhancement of 11% is achieved with cross validation error of 29.6. A superior assortment of training fabric to extract few additional useful features can enhance accuracy further.

show abstract

“…Voice activity detection (VAD) is a technique for detecting the presence of speech signal in speech data [22]. It has been widely used to enhance the speech contents such as speech classification [23], speaker recognition [24], and speech enhancement [25,26]. Figure 4 shows three processing steps for VAD: (1) noise reduction, (2) segmentation, and (3) elimination [27].…”

Section: Voice Activity Detectionmentioning

confidence: 99%

A Preprocessing Strategy for Denoising of Speech Data Based on Speech Segment Detection

Lee

Kwon

2020

Applied Sciences

View full text Add to dashboard Cite

In this paper, we propose a preprocessing strategy for denoising of speech data based on speech segment detection. A design of computationally efficient speech denoising is necessary to develop a scalable method for large-scale data sets. Furthermore, it becomes more important as the deep learning-based methods have been developed because they require significant costs while showing high performance in general. The basic idea of the proposed method is using the speech segment detection so as to exclude non-speech segments before denoising. The speech segmentation detection can exclude non-speech segments with a negligible cost, which will be removed in denoising process with a much higher cost, while maintaining the accuracy of denoising. First, we devise a framework to choose the best preprocessing method for denoising based on the speech segment detection for a target environment. For this, we speculate the environments for denoising using different levels of signal-to-noise ratio (SNR) and multiple evaluation metrics. The framework finds the best speech segment detection method tailored to a target environment according to the performance evaluation of speech segment detection methods. Next, we investigate the accuracy of the speech segment detection methods extensively. We conduct the performance evaluation of five speech segment detection methods with different levels of SNRs and evaluation metrics. Especially, we show that we can adjust the accuracy between the precision and recall of each method by controlling a parameter. Finally, we incorporate the best speech segment detection method for a target environment into a denoising process. Through extensive experiments, we show that the accuracy of the proposed scheme is comparable to or even better than that of Wavenet-based denoising, which is one of recent advanced denoising methods based on deep neural networks, in terms of multiple evaluation metrics of denoising, i.e., SNR, STOI, and PESQ, while it can reduce the denoising time of the Wavenet-based denoising by approximately 40–50% according to the used speech segment detection method.

show abstract

Robust Audio Content Classification Using Hybrid-Based SMD and Entropy-Based VAD

Cited by 7 publications

References 68 publications

[Retracted] Music Classification Method Using Big Data Feature Extraction and Neural Networks

[Retracted] Music Classification Method Using Big Data Feature Extraction and Neural Networks

An algorithm for enhancement of audio content classification

A Preprocessing Strategy for Denoising of Speech Data Based on Speech Segment Detection

Contact Info

Product

Resources

About