A robust voice spoofing detection system using novel CLS-LBP features and LSTM

Dawood, Hussain; Saleem, Sajid; Hassan, Farman; Javed, Ali

doi:10.1016/j.jksuci.2022.02.024

Cited by 15 publications

(5 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Furthermore, 9.33%, 7.69%, 8.09%, 9.57%, and 2.502% EER are achieved by MFCC-ResNet [13], CQCC-ResNet [13], LFCC-GMM [58], and CQCC-GMM [58] respectively. Similarly, another model [59] attained 0.58% EER and 0.0160 t-DCF for the PA eval set. Other algorithms, such as [13] and [58], performed various experiments and attained 4.43%, 13.54%, 1.04%, and 0.459% EER.…”

Section: Hcomparative Analysis With Existing Features Extraction-base...mentioning

confidence: 86%

“…For the LA set, the best EER is 0.045%, and our proposed spoofing detector attains a t-DCF of 0.002. The second-best EER is 0.06, and t-DCF is 0.0017 attained by [59]. Furthermore, 9.33%, 7.69%, 8.09%, 9.57%, and 2.502% EER are achieved by MFCC-ResNet [13], CQCC-ResNet [13], LFCC-GMM [58], and CQCC-GMM [58] respectively.…”

Section: Hcomparative Analysis With Existing Features Extraction-base...mentioning

confidence: 94%

See 1 more Smart Citation

EDL-Det: A Robust TTS Synthesis Detector Using VGG19-Based YAMNet and Ensemble Learning Block

Mahum,

Irtaza,

Javed

2023

IEEE Access

Self Cite

View full text Add to dashboard Cite

Various algorithms exist for the audio deep fake synthesis, such as deep voice, tacotron, fastspeech, and imitation techniques. Despite the existence of various spoofing speech detectors, they are not ready to distinguish unseen audio samples with high precision. In this study, we suggest a robust model, namely Ensemble Deep Learning based Detector (EDL-Det) to detect text-to-speech (TTS) and categorize it into spoofed and bonafide classes. Our proposed model is an improved method based on YAMNet employing VGG19 as a base network instead of MobileNet combined with two other deep learning(DL) methods. Our proposed system effectively analyzes the mel-spectrograms generated from input audio to extract the better artifacts underlying the audio signals. We have added an ensemble learning block that consists of ResNet50, and InceptionNetv2. First, we convert speech into mel-spectrograms that consist of time-frequency representations. Second, we train our model using the ASVspoof-2019 dataset. In the end, we classified the audios converting them into mel-spectrograms using our trained binary classifier along with a majority voting scheme by three networks. Due to deep convolutional network architecture, our proposed model effectively extracts the most representative features from the mel-spectrograms. Furthermore, we have performed extensive experiments to assess the performance of the suggested model using the ASVspoof 2019 corpus. Additionally, our proposed model is robust enough to identify the unseen spoofed audios and accurately classify the attacks based on cloning algorithms.

show abstract

Section: Hcomparative Analysis With Existing Features Extraction-base...mentioning

confidence: 86%

Section: Hcomparative Analysis With Existing Features Extraction-base...mentioning

confidence: 94%

EDL-Det: A Robust TTS Synthesis Detector Using VGG19-Based YAMNet and Ensemble Learning Block

Mahum,

Irtaza,

Javed

2023

IEEE Access

Self Cite

View full text Add to dashboard Cite

show abstract

“…To improve the ASV consistency in reverberant conditions [49,50], ASVspoof 2019 [2] comprises simulated replay recordings [51][52][53] in deep acoustic environments as opposed to the ASVspoof 2017 dataset [17], which contained the replay attacks. For the collection of PA samples, physical characteristics were considered, for example, room sizes in which the audios were synthesized, which were divided into three categories: small, medium, and large.…”

Section: Datasetmentioning

confidence: 99%

Fake speech detection using VGGish with attention block

Kanwal,

Mahum,

AlSalman

et al. 2024

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

While deep learning technologies have made remarkable progress in generating deepfakes, their misuse has become a well-known concern. As a result, the ubiquitous usage of deepfakes for increasing false information poses significant risks to the security and privacy of individuals. The primary objective of audio spoofing detection is to identify audio generated through numerous AI-based techniques. Several techniques for fake audio detection already exist using machine learning algorithms. However, they lack generalization and may not identify all types of AI-synthesized audios such as replay attacks, voice conversion, and text-to-speech (TTS). In this paper, a deep layered model, i.e., VGGish, along with an attention block, namely Convolutional Block Attention Module (CBAM) for spoofing detection, is introduced. Our suggested model successfully classifies input audio into two classes: Fake and Real, converting them into mel-spectrograms, and extracting their most representative features due to the attention block. Our model is a significant technique to utilize for audio spoofing detection due to a simple layered architecture. It captures complex relationships in audio signals due to both spatial and channel features present in an attention module. To evaluate the effectiveness of our model, we have conducted in-depth testing using the ASVspoof 2019 dataset. The proposed technique achieved an EER of 0.52% for Physical Access (PA) attacks and 0.07 % for Logical Access (LA) attacks.

show abstract

“…They evaluated their approach on ASVspoof 2019 and VSDC datasets. Dawood et al, (2022) suggested a new feature descriptor Center Lop-Sided Local binary patterns (CS-LBP) to represent audio files in the best manner. These features were also fed into the long short-term memory network for the detection of audio forgery.…”

Section: Related Workmentioning

confidence: 99%

Multi pattern features based spoofing detection mechanism using one class learning

Ustubioglu,

Ulutas,

Kilic

et al. 2023

Preprint

View full text Add to dashboard Cite

Automatic Speaker Verification systems are prone to various voice spoofing attacks such as replays, voice conversion (VC), speech synthesis, etc. Malicious users can perform specific tasks such as controlling the bank account of someone, taking the control of a smart home, and similar activities by using advanced audio manipulation techniques. This study presents a Multi-Pattern Features Based Spoofing detection mechanism using the modified ResNet architecture and OC-Softmax layer to detect various LA and PA spoofing attacks. We proposed a novel Pattern Features-based audio spoof detection scheme. The scheme contains three branches to evaluate different patterns on a Mel spectrogram of the audio file. This is the first work for the audio spoofing detection task using three different pattern representations of Mel spectrogram with modified ResNet architecture and OC-Softmax layer. Through the proposed network, we can extract pattern images from the Mel spectrogram and gives each of them into modified ResNet architecture. At the last step of each network, we use OC-Softmax to obtain a score for the current pattern image and then the method fuses three scores to label the input audio. Experimental results on the ASVspoof 2019 corpus show that the proposed method achieves better results in the challenges of ASVspoof 2019 than state-of-the-art methods. For example, in the logical access scenario, our model improves the tandem decision cost function and equal error rate scores by 0.06% and 2.14%, respectively, compared with state-of-the-art methods. Additionally, experiments illustrate that the proposed fused decision improved the performance of the system.

show abstract

A robust voice spoofing detection system using novel CLS-LBP features and LSTM

Cited by 15 publications

References 30 publications

EDL-Det: A Robust TTS Synthesis Detector Using VGG19-Based YAMNet and Ensemble Learning Block

EDL-Det: A Robust TTS Synthesis Detector Using VGG19-Based YAMNet and Ensemble Learning Block

Fake speech detection using VGGish with attention block

Multi pattern features based spoofing detection mechanism using one class learning

Contact Info

Product

Resources

About