“…Acoustic Models Some traditional ML methods are generative, while others are discriminative. These methods are suited for imposter detection in applicable datasets from ASV systems' initial research [43]. The proposed work applies multiple classification algorithms such as SVM, NB, DT and KNN.…”
Analyzing the intricate nature of an audio signal often requires the extraction of relevant features, which serve as informative descriptors of the signal. It entails studying the signal and determining how signals are related to one another. As a result, the performance of audio spoofing detection in Automatic Speaker Verification (ASV) systems is strongly reliant on front-end feature extraction. In this paper, three types of successively integrated features have been proposed. First, Acoustic Ternary Pattern (ATP) image features are sequentially fused with different audio features such as MFCC, CQCC, GTCC, BFCC and PLP, individually. Second, LBP image features are combined with all these audio features similarly. Then, the sequential integration of ATP-LBP features is combined individually with MFCC, CQCC, GTCC, BFCC and PLP features. Finally, these front-end hybrid feature sets are classified using different ML and deep learning algorithms based acoustic models at the back-end. The state-of-the-art ASVspoof 2019 dataset has been used to implement various front-end and back-end combinations. The research outcomes reveal that the proposed approach achieved the best results with ATP-LBP-GTCC at the front end with LSTM-based acoustic model at the back-end.
“…Acoustic Models Some traditional ML methods are generative, while others are discriminative. These methods are suited for imposter detection in applicable datasets from ASV systems' initial research [43]. The proposed work applies multiple classification algorithms such as SVM, NB, DT and KNN.…”
Analyzing the intricate nature of an audio signal often requires the extraction of relevant features, which serve as informative descriptors of the signal. It entails studying the signal and determining how signals are related to one another. As a result, the performance of audio spoofing detection in Automatic Speaker Verification (ASV) systems is strongly reliant on front-end feature extraction. In this paper, three types of successively integrated features have been proposed. First, Acoustic Ternary Pattern (ATP) image features are sequentially fused with different audio features such as MFCC, CQCC, GTCC, BFCC and PLP, individually. Second, LBP image features are combined with all these audio features similarly. Then, the sequential integration of ATP-LBP features is combined individually with MFCC, CQCC, GTCC, BFCC and PLP features. Finally, these front-end hybrid feature sets are classified using different ML and deep learning algorithms based acoustic models at the back-end. The state-of-the-art ASVspoof 2019 dataset has been used to implement various front-end and back-end combinations. The research outcomes reveal that the proposed approach achieved the best results with ATP-LBP-GTCC at the front end with LSTM-based acoustic model at the back-end.
“…Four different microphones were used in the data collection. As the ReMASC corpus was made up of recordings via a variety of microphones instead of a single microphone, it is well-suited for multi-channel voice PAD research such as [34]. Another major effort from the community of spoofing and anti-spoofing for ASV was the ASVspoof Challenge series.…”
The emergence of biometric technology provides enhanced security compared to the traditional identification and authentication techniques that were less efficient and secure. Despite the advantages brought by biometric technology, the existing biometric systems such as Automatic Speaker Verification (ASV) systems are weak against presentation attacks. A presentation attack is a spoofing attack launched to subvert an ASV system to gain access to the system. Though numerous Presentation Attack Detection (PAD) systems were reported in the literature, a systematic survey that describes the current state of research and application is unavailable. This paper presents a systematic analysis of the state-of-the-art voice PAD systems to promote further advancement in this area. The objectives of this paper are two folds: (i) to understand the nature of recent work on PAD systems, and (ii) to identify areas that require additional research. From the survey, a taxonomy of voice PAD and the trend analysis of recent work on PAD systems were built and presented, whereby the recent and relevant articles including articles from Interspeech and ICASSP Conferences, mostly indexed by Scopus, published between 2015 and 2021 were considered. A total of 172 articles were surveyed in this work. The findings of this survey present the limitation of recent works, which include spoof-type dependent PAD. Consequently, the future direction of work on voice PAD for interested researchers is established. The findings of this survey present the limitation of recent works, which include spoof-type dependent PAD. Consequently, the future direction of work on voice PAD for interested researchers is established.
“…After establishing the vulnerability of VCDs, ( Gong, Yang & Poellabauer, 2020 ) presents another concern regarding the number of channels employed in attacks on these devices. They present a neural network-based model designed for the specific purpose of detecting multichannel audio.…”
Nowadays, biometric authentication has gained relevance due to the technological advances that have allowed its inclusion in many daily-use devices. However, this same advantage has also brought dangers, as spoofing attacks are now more common. This work addresses the vulnerabilities of automatic speaker verification authentication systems, which are prone to attacks arising from new techniques for the generation of spoofed audio. In this article, we present a countermeasure for these attacks using an approach that includes easy to implement feature extractors such as spectrograms and mel frequency cepstral coefficients, as well as a modular architecture based on deep neural networks. Finally, we evaluate our proposal using the well-know ASVspoof 2017 V2 database, the experiments show that using the final architecture the best performance is obtained, achieving an equal error rate of 6.66% on the evaluation set.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.