Identification of Fake Stereo Audio Using SVM and CNN

Liu, Tianyun; Yan, Diqun; Wang, Rangding; Yan, Nan; Chen, Gang

doi:10.3390/info12070263

Cited by 25 publications

(19 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Sanchez et al in [99] proposed a model based on the statistical classifier for synthetic speech and the MFCC was used as an authorized baseline. Lui et al in [100] developed a model for fake Stereo Audio detection where the classifier was the SVM, and the feature was MFCC. Based on their result, the MFCC can detect stereo-faking audio very effectively.…”

Section: Fake Speech Detectionmentioning

confidence: 99%

Mel Frequency Cepstral Coefficient and its Applications: A Review

Abdul

Al-Talabani

2022

IEEE Access

View full text Add to dashboard Cite

Feature extraction and representation has significant impact on the performance of any machine learning method. Mel Frequency Cepstrum Coefficient (MFCC) is designed to model features of audio signal and is widely used in various fields. This paper aims to review the applications that the MFCC is used for in addition to some issues that facing the MFCC computation and its impact on the model performance. These issues include the use of MFCC for non-acoustic signals, adopting the MFCC alone or combining it with other features, the use of time series versus global representation of the MFCC, following the standard form of the MFCC computation versus modifying its parameters, and supplying the traditional machine learning methods versus the deep learning methods..

show abstract

Section: Fake Speech Detectionmentioning

confidence: 99%

Mel Frequency Cepstral Coefficient and its Applications: A Review

Abdul

Al-Talabani

2022

IEEE Access

View full text Add to dashboard Cite

show abstract

“…The results show that RF performs best compared to SVM with a 71% accuracy result. In a similar way, [110] also used the H-Voice dataset and compared the effectiveness of SVM with the DL technique CNN to distinguish fake audio from actual stereo audio. The study discovered that the CNN is more resilient than the SVM, even though both obtained a high classification accuracy of 99%.…”

Section: Deepfake Audio Detection Techniquesmentioning

confidence: 99%

A Review of Image Processing Techniques for Deepfakes

Shahzad

Rustam

Flores

et al. 2022

Sensors

View full text Add to dashboard Cite

Deep learning is used to address a wide range of challenging issues including large data analysis, image processing, object detection, and autonomous control. In the same way, deep learning techniques are also used to develop software and techniques that pose a danger to privacy, democracy, and national security. Fake content in the form of images and videos using digital manipulation with artificial intelligence (AI) approaches has become widespread during the past few years. Deepfakes, in the form of audio, images, and videos, have become a major concern during the past few years. Complemented by artificial intelligence, deepfakes swap the face of one person with the other and generate hyper-realistic videos. Accompanying the speed of social media, deepfakes can immediately reach millions of people and can be very dangerous to make fake news, hoaxes, and fraud. Besides the well-known movie stars, politicians have been victims of deepfakes in the past, especially US presidents Barak Obama and Donald Trump, however, the public at large can be the target of deepfakes. To overcome the challenge of deepfake identification and mitigate its impact, large efforts have been carried out to devise novel methods to detect face manipulation. This study also discusses how to counter the threats from deepfake technology and alleviate its impact. The outcomes recommend that despite a serious threat to society, business, and political institutions, they can be combated through appropriate policies, regulation, individual actions, training, and education. In addition, the evolution of technology is desired for deepfake identification, content authentication, and deepfake prevention. Different studies have performed deepfake detection using machine learning and deep learning techniques such as support vector machine, random forest, multilayer perceptron, k-nearest neighbors, convolutional neural networks with and without long short-term memory, and other similar models. This study aims to highlight the recent research in deepfake images and video detection, such as deepfake creation, various detection algorithms on self-made datasets, and existing benchmark datasets.

show abstract

“…It is one of the most sophisticated technologies and is based on the fact that the crucial bandwidths of the human ear vary in frequency. The Mel-frequency scale, which is a linear frequency space below 1000 Hz and a logarithmic space above 1000 Hz, is used to show this information [24] Spectral Centroid (SC) SC (also known as brightness) represents the focal point in the spectral power distribution of a signal in a sample frame [25].…”

Section: Mel-frequency Cepstral Coefficients (Mfcc)mentioning

confidence: 99%

Audio-Visual Quality of Experience Prediction Based on ELM Model

2022

IJIES

View full text Add to dashboard Cite

Measuring end-user satisfaction, or quality of experience (QoE) became necessary to improve video streaming applications. This measure represents the end-user's degree of satisfaction with the quality of their video conference. This study measures both audio and video QoE, using two types of databases; (UnB-AV database and INRS database) have been used in this work. The UnB-AB database has been used as a target dataset. In this work, several features for audio and video files have been extracted. The extreme learning machine algorithm has been used for predicting the audio-visual QoE, and performance of the proposed model was validated with unseen data. Experiments on the two datasets have shown that the ELM model achieving better prediction accuracy when applied on the UnB-AV database than INRS database. The prediction accuracy by depended on UnB-AV dataset was (0.13) but in depended on INRS dataset was (0.16).

show abstract

Identification of Fake Stereo Audio Using SVM and CNN

Cited by 25 publications

References 29 publications

Mel Frequency Cepstral Coefficient and its Applications: A Review

Mel Frequency Cepstral Coefficient and its Applications: A Review

A Review of Image Processing Techniques for Deepfakes

Audio-Visual Quality of Experience Prediction Based on ELM Model

Contact Info

Product

Resources

About