Voice conversion spoofing detection by exploring artifacts estimates

Hemavathi, R.; Kumaraswamy, R.

doi:10.1007/s11042-020-10212-0

Cited by 7 publications

(4 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Tak et al utilized a Graph Attention Network (GAT) to model the feature quantities of multi-band and multi-temporal data [40]. Hemavathi et al adopted a blind source separation (BSS) based on non-negative matrix factorization to separate artificially synthesized sounds into genuine and fake components and then used a CNN network to achieve the target audio's authenticity identification [41].…”

Section: Related Workmentioning

confidence: 99%

DAMRN: Deep attention mechanism residual network for deepfake audio detection using cepstral coefficient features

Zhou,

Yang,

Yan

et al. 2024

Preprint

View full text Add to dashboard Cite

In order to further curb the misuse of Deepfake audio technology, we proposed deep attention mechanism residual network (DAMRN) that can effectively detect forged audio. The network exhibits stable operation, low risk of gradient disappearance and gradient explosion, and high detection accuracy. The structure of proposed network mainly involves the following contents: Firstly, a data balancing strategy is adopted in the front end of the network so that the ratio of positive and negative samples in the data maintained proportional balance, which improves the network performance and reduces the overfitting phenomenon. This strategy has been effectively proven by the experiments in this article. Secondly, we compare the accuracy rates of different depths among the network models for Deepfake audio detection (DFAD), and select the network that best suits the depth of this article. Finally, we introduce an effective attention mechanism in the network structure appropriately to increase the network's sensitivity to forged speech artifact information. By obtaining the artifact information of the Deepfake audio, the network model can learn more falsification frequency features that can effectively distinguish between spoofed and bonafide audio, and the accuracy has been improved to 99.81%, with the EER reduced to 0.69%, compared to the baseline system. Experiments are conducted using three acoustic features (MFCC, LFCC, GFCC) extracted from two mainstream datasets (ASVspoof2019LA, ASVspoof2021DF) respectively, and the results show that the best EER value of the method proposed in this paper is 0.32%, which is a better performance compared with other mainstream models.

show abstract

Section: Related Workmentioning

confidence: 99%

DAMRN: Deep attention mechanism residual network for deepfake audio detection using cepstral coefficient features

Zhou,

Yang,

Yan

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

“…MFCC [24], LFCC, and CQCC [25] are the main features used for synthetic spoofing detection. Various deep learning architectures like DNN [26] - [27], LCNN [28] and LSTM [29] have been employed for spoofing detection. For the replay attack detection, seven augmentation techniques were tested; out of these, dynamic value change and pitch change showed an 8% improvement in base model accuracy [30].…”

Section: Literature Reviewmentioning

confidence: 99%

The Effect of Synthetic Voice Data Augmentation on Spoken Language Identification on Indian Languages

Ambili,

Roy

2023

IEEE Access

View full text Add to dashboard Cite

Multilingual based voice activated human computer interaction systems are currently in high demand. The Spoken Language Identification Unit (SPLID) is an inevitable front end unit of such a multilingual system. These systems will be a great boon to a country like India where around 24 official languages are spoken. Deep learning architectures for spoken language identification have progressed to the point that they can now perform well, even in the presence of various background noises. However, a strong phonetic relationship across various Indian languages leads to increased confusion in the SPLID unit. Therefore, the goal of this study is to propose a synthetic voice data augmentation method based on speech synthesis to improve the spoken Indian language identification system. Here the research attempts to determine how well pre-trained computer vision models recognize spoken languages in synthetic and classical audio augmentation environments. The accuracy of the models was compared using bottleneck features extracted from three different pre-trained models VGG16, RESNET50, and Inception-v3 while using an Artificial Neural Network (ANN), Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), Naive Bayes (NB), Decision Tree (DT) and KNN (K-Nearest Neighbors) as classifiers.The proposed system was tested on three Indian language datasets -two comprising seven Indian languages (Hindi, Malayalam, Tamil, Telugu, Marathi, Kannada and Bengali), one containing five Indian languages (Tamil, Hindi, Malayalam, Oria and Assamese), and on a foreign language dataset. It was found that the addition of synthetic audio samples improved the accuracy by 17%. Among the pre-trained models, VGG16 and Inception-v3 combined with PCA and ANN were found to have the maximum accuracy of 97% .

show abstract

“…An alternative approach involves the exploration of artifact estimations, a result that arises when an impostor tries to transform their speech into a genuine version [33]. This study was based on the assumption that all manipulated speech samples would exhibit artifacts.…”

Section: Voice Spoofingmentioning

confidence: 99%

Enhancing Security and Accountability in Autonomous Vehicles through Robust Speaker Identification and Blockchain-Based Event Recording

Njoku,

Nwakanma,

Lee

et al. 2023

Electronics

View full text Add to dashboard Cite

As the deployment of Autonomous Vehicles (AVs) gains momentum, ensuring both security and accountability becomes paramount. This paper proposes a comprehensive approach to address these concerns. With the increasing importance of speaker identification, our first contribution lies in implementing a robust mechanism for identifying authorized users within AVs, enhancing security. To counter the threat of voice spoofing, an ensemble-based approach leveraging speaker verification techniques is presented, ensuring the authenticity of user commands. Furthermore, in scenarios of accidents involving AVs, the need for accurate accountability and liability allocation arises. To address this, we introduce a novel application of blockchain technology, enabling an event recording system that ensures transparent and tamper-proof records. The proposed system enhances AV security and establishes a framework for reliable accident investigation using speakers’ records. In addition, this paper presents an innovative concept where vehicles act as impartial judges during accidents, utilizing location-based identification. Results show the viability of the proposed solution for accident investigation and analysis.

show abstract

Voice conversion spoofing detection by exploring artifacts estimates

Cited by 7 publications

References 35 publications

DAMRN: Deep attention mechanism residual network for deepfake audio detection using cepstral coefficient features

DAMRN: Deep attention mechanism residual network for deepfake audio detection using cepstral coefficient features

The Effect of Synthetic Voice Data Augmentation on Spoken Language Identification on Indian Languages

Enhancing Security and Accountability in Autonomous Vehicles through Robust Speaker Identification and Blockchain-Based Event Recording

Contact Info

Product

Resources

About