2021
DOI: 10.1186/s13635-021-00116-3
|View full text |Cite
|
Sign up to set email alerts
|

Synthetic speech detection through short-term and long-term prediction traces

Abstract: Several methods for synthetic audio speech generation have been developed in the literature through the years. With the great technological advances brought by deep learning, many novel synthetic speech techniques achieving incredible realistic results have been recently proposed. As these methods generate convincing fake human voices, they can be used in a malicious way to negatively impact on today’s society (e.g., people impersonation, fake news spreading, opinion formation). For this reason, the ability of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
17
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
3

Relationship

2
8

Authors

Journals

citations
Cited by 46 publications
(17 citation statements)
references
References 31 publications
0
17
0
Order By: Relevance
“…However, we also found that Discriminator is prone to being deceived by the generated samples from other speech generative models as Discriminator was not jointly trained with those generative models. Therefore, we expect more robust synthesized speech detection algorithms to be developed in the future such as [48,40,9,6].…”
Section: Conclusion and Discussionmentioning
confidence: 99%
“…However, we also found that Discriminator is prone to being deceived by the generated samples from other speech generative models as Discriminator was not jointly trained with those generative models. Therefore, we expect more robust synthesized speech detection algorithms to be developed in the future such as [48,40,9,6].…”
Section: Conclusion and Discussionmentioning
confidence: 99%
“…It is observed that QSVM beats other traditional approaches by 97.56% accuracy and has only a 2.43% misclassification rate. Similarly, Borrelli et al [109] created an SVM model using RF to classify artificial voices using a novel audio component known as short-term long-term (STLT). The Automatic Speaker Verification (ASV) spoof 2019 challenge dataset was used to train the models.…”
Section: Deepfake Audio Detection Techniquesmentioning
confidence: 99%
“…In the audio field, [9] feeds linear filter banks into a Resnet to generate embeddings used as input of a neural network classifier, and in [10] long-term features are used to discriminate fake and real audio tracks. Recently [11] detected for audio deepfakes based on long-term and short-term predictor features, while [12] exploits the traces left by time scaling to discriminate fake audio signals.…”
Section: Introductionmentioning
confidence: 99%