Data Augmentation with Signal Companding for Detection of Logical Access Attacks

Das, Rohan Kumar; Yang, Jichen; Li, Haizhou

doi:10.1109/icassp39728.2021.9413501

Cited by 19 publications

(7 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The logarithm of the CQT power spectrum is then saved as extracted feature for each example. The parameters related to CQT extraction follows our previous work given in [31]. On the other hand, the ERB spectrum (erbSpec) is extracted with 43 number of gammatone filters using audioFeatureExtractor of MAT-LAB audio toolbox 7 .…”

Section: Methodsmentioning

confidence: 99%

Diagnosis of COVID-19 Using Auditory Acoustic Cues

Das

Madhavi

2021

Interspeech 2021

Self Cite

View full text Add to dashboard Cite

COVID-19 can be pre-screened based on symptoms and confirmed using other laboratory tests. The cough or speech from patients are also studied in the recent time for detection of COVID-19 as they are indicators of change in anatomy and physiology of the respiratory system. Along this direction, the diagnosis of COVID-19 using acoustics (DiCOVA) challenge aims to promote such research by releasing publicly available cough/speech corpus. We participated in the Track-1 of the challenge, which deals with COVID-19 detection using cough sounds from individuals. In this challenge, we use a few novel auditory acoustic cues based on long-term transform, equivalent rectangular bandwidth spectrum and gammatone filterbank. We evaluate these representations using logistic regression, random forest and multilayer perceptron classifiers for detection of COVID-19. On the blind test set, we obtain an area under the ROC curve (AUC) of 83.49% for the best system submitted to the challenge. It is worth noting that the submitted system ranked among the top few systems on the leaderboard and outperformed the challenge baseline by a large margin.

show abstract

Section: Methodsmentioning

confidence: 99%

Diagnosis of COVID-19 Using Auditory Acoustic Cues

Das

Madhavi

2021

Interspeech 2021

Self Cite

View full text Add to dashboard Cite

show abstract

“…For the replay attack detection, seven augmentation techniques were tested; out of these, dynamic value change and pitch change showed an 8% improvement in base model accuracy [30]. A data augmentation technique using a-law and mu-law based signal companding was explored in [31] for the detection of logical access attacks. For data augmentation, an Auxiliary Classifier Generative Adversarial Network (AC-GAN) was also proposed to generate more speech samples with diverse variants [32] combined with a post-selection quality frame selection based on CNN, giving more accuracy.…”

Section: Literature Reviewmentioning

confidence: 99%

The Effect of Synthetic Voice Data Augmentation on Spoken Language Identification on Indian Languages

Ambili,

Roy

2023

IEEE Access

View full text Add to dashboard Cite

Multilingual based voice activated human computer interaction systems are currently in high demand. The Spoken Language Identification Unit (SPLID) is an inevitable front end unit of such a multilingual system. These systems will be a great boon to a country like India where around 24 official languages are spoken. Deep learning architectures for spoken language identification have progressed to the point that they can now perform well, even in the presence of various background noises. However, a strong phonetic relationship across various Indian languages leads to increased confusion in the SPLID unit. Therefore, the goal of this study is to propose a synthetic voice data augmentation method based on speech synthesis to improve the spoken Indian language identification system. Here the research attempts to determine how well pre-trained computer vision models recognize spoken languages in synthetic and classical audio augmentation environments. The accuracy of the models was compared using bottleneck features extracted from three different pre-trained models VGG16, RESNET50, and Inception-v3 while using an Artificial Neural Network (ANN), Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), Naive Bayes (NB), Decision Tree (DT) and KNN (K-Nearest Neighbors) as classifiers.The proposed system was tested on three Indian language datasets -two comprising seven Indian languages (Hindi, Malayalam, Tamil, Telugu, Marathi, Kannada and Bengali), one containing five Indian languages (Tamil, Hindi, Malayalam, Oria and Assamese), and on a foreign language dataset. It was found that the addition of synthetic audio samples improved the accuracy by 17%. Among the pre-trained models, VGG16 and Inception-v3 combined with PCA and ANN were found to have the maximum accuracy of 97% .

show abstract

“…When the evaluation trials from another data set are mounted, any mismatch could make the model fail to respond. This motivates a few studies using data augmentation, for example, augmentation based on waveform companding [21] and frequency mask [13], to alleviate potential mismatches between training and unseen test data. Another potential direction is to use self-supervised front end trained on various speech data.…”

Section: Remaining Issues and Challengesmentioning

confidence: 99%

A Practical Guide to Logical Access Voice Presentation Attack Detection

Wang¹,

Yamagishi²

2022

Preprint

View full text Add to dashboard Cite

Voice-based human-machine interfaces with an automatic speaker verification (ASV) component are commonly used in the market. However, the threat from presentation attacks is also growing since attackers can use recent speech synthesis technology to produce a naturalsounding voice of a victim. Presentation attack detection (PAD) for ASV, or speech anti-spoofing, is therefore indispensable. Research on voice PAD has seen significant progress since the early 2010s, including the advancement in PAD models, benchmark datasets, and evaluation campaigns. This chapter presents a practical guide to the field of voice PAD, with a focus on logical access attacks using text-to-speech and voice conversion algorithms and spoofing countermeasures based on artifact detection. It introduces the basic concept of voice PAD, explains the common techniques, and provides an experimental study using recent methods on a benchmark dataset. Code for the experiments is open-sourced.

show abstract

Data Augmentation with Signal Companding for Detection of Logical Access Attacks

Cited by 19 publications

References 28 publications

Diagnosis of COVID-19 Using Auditory Acoustic Cues

Diagnosis of COVID-19 Using Auditory Acoustic Cues

The Effect of Synthetic Voice Data Augmentation on Spoken Language Identification on Indian Languages

A Practical Guide to Logical Access Voice Presentation Attack Detection

Contact Info

Product

Resources

About