Sivanand Achanta scite author profile

The ASVspoof 2017 challenge is about the detection of replayed speech from human speech. The proposed system makes use of the fact that when the speech signals are replayed, they pass through multiple channels as opposed to original recordings. This channel information is typically embedded in low signal to noise ratio regions. A speech signal processing method with high spectro-temporal resolution is required to extract robust features from such regions. The single frequency filtering (SFF) is one such technique, which we propose to use for replay attack detection. While SFF based feature representation was used at front-end, Gaussian mixture model and bi-directional long short-term memory models are investigated at the backend as classifiers. The experimental results on ASVspoof 2017 dataset reveal that, SFF based representation is very effective in detecting replay attacks. The score level fusion of back end classifiers further improved the performance of the system which indicates that both classifiers capture complimentary information.

show abstract

Query-by-Example Spoken Term Detection using Frequency Domain Linear Prediction and Non-Segmental Dynamic Time Warping

Mantena

Achanta

Prahallad

2014

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Deep Elman recurrent neural networks for statistical parametric speech synthesis

Achanta

Gangashetty

2017

Speech Communication

View full text Add to dashboard Cite

Detection of Replay Attacks Using Single Frequency Filtering Cepstral Coefficients

Alluri¹,

Achanta²,

Kadiri³

et al. 2017

View full text Add to dashboard Cite

Automatic speaker verification systems are vulnerable to spoofing attacks. Recently, various countermeasures have been developed for detecting high technology attacks such as speech synthesis and voice conversion. However, there is a wide gap in dealing with replay attacks. In this paper, we propose a new feature for replay attack detection based on single frequency filtering (SFF), which provides high temporal and spectral resolution at each instant. Single frequency filtering cepstral coefficients (SFFCC) with Gaussian mixture model classifier are used for the experimentation on the standard BTAS-2016 corpus. The previously reported best result, which is based on constant Q cepstral coefficients (CQCC) achieved a half total error rate of 0.67 % on this data-set. Our proposed method outperforms the state of the art (CQCC) with a half total error rate of 0.0002 %.

show abstract

On-Device Neural Speech Synthesis

Achanta

Antony

Golipour

et al. 2021

View full text Add to dashboard Cite

A Study on Text-Independent Speaker Recognition Systems in Emotional Conditions Using Different Pattern Recognition Models

Alluri

Achanta

Prasath³

et al. 2017

View full text Add to dashboard Cite

An investigation of recurrent neural network architectures for statistical parametric speech synthesis

Achanta¹,

Godambe²,

Gangashetty³

2015

View full text Add to dashboard Cite

Contextual Representation using Recurrent Neural Network Hidden State for Statistical Parametric Speech Synthesis

Achanta¹,

Banoth²,

Pandey³

et al. 2016

View full text Add to dashboard Cite

In this paper, we propose to use hidden state vector obtained from recurrent neural network (RNN) as a context vector representation for deep neural network (DNN) based statistical parametric speech synthesis. While in a typical DNN based system, there is a hierarchy of text features from phone level to utterance level, they are usually in 1-hot-k encoded representation. Our hypothesis is that, supplementing the conventional text features with a continuous frame-level acoustically guided representation would improve the acoustic modeling. The hidden state from an RNN trained to predict acoustic features is used as the additional contextual information. A dataset consisting of 2 Indian languages (Telugu and Hindi) from Blizzard challenge 2015 was used in our experiments. Both the subjective listening tests and the objective scores indicate that the proposed approach performs significantly better than the baseline DNN system.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.