Vijendra Raj Apsingekar scite author profile

In this paper, we investigate imposture using synthetic speech. Although this problem was first examined over a decade ago, dramatic improvements in both speaker verification (SV) and speech synthesis have renewed interest in this problem. We use a HMM-based speech synthesizer which creates synthetic speech for a targeted speaker through adaptation of a background model. We use two SV systems: standard GMM-UBM-based and a newer SVM-based. Our results show when the systems are tested with human speech, there are zero false acceptances and zero false rejections. However, when the systems are tested with synthesized speech, all claims for the targeted speaker are accepted while all other claims are rejected. We propose a two-step process for detection of synthesized speech in order to prevent this imposture. Overall, while SV systems have impressive accuracy, even with the proposed detector, high-quality synthetic speech will lead to an unacceptably high false acceptance rate.

show abstract

Speaker verification score normalization using speaker model clusters

Apsingekar

León

2011

Speech Communication

View full text Add to dashboard Cite

Reducing Speaker Model Search Space in Speaker Identification

León

Apsingekar

2007

View full text Add to dashboard Cite

For large population speaker identification (SID) systems, likelihood computations between an unknown speaker's test feature set and speaker models can be very time-consuming and detrimental to applications where fast SID is required. In this paper, we propose a method whereby speaker models are clustered during the training stage. Then during the testing stage, only those clusters which are likely to contain high-likelihood speaker models are searched. The proposed method reduces the speaker model space which directly results in faster SID. Although there maybe a slight loss in identification accuracy depending on the number of clusters searched, this loss can be controlled by trading off speed and accuracy.

show abstract

Support vector machine based speaker identification systems using GMM parameters

Apsingekar

León

2009

View full text Add to dashboard Cite

Speaker Identification in Room Reverberation Using GMM-UBM

Akula

Apsingekar

León

2009

View full text Add to dashboard Cite

Speaker recognition systems tend to degrade if the training and testing conditions differ significantly. Such situations may arise due to the use of different microphones, telephone and mobile handsets or different acoustic conditions. Recently, the effect of the room acoustics on speaker identification (SI) has been investigated and it has been shown that a loss in accuracy results when using clean training and reverberated testing signals. Various techniques like dereverberation, use of multiple microphones, compensations have been proposed to minimize/alleviate the mismatch thereby increasing the SI accuracies. In this paper, we propose to use a Gaussian mixture model-Universal background model (GMM-UBM), with the multiple speaker model approach previously proposed, to compensate for the acoustical mismatch. By using this approach, the SI accuracies have improved over the conventional GMM based SI systems in the presence of room reverberation.

show abstract

Efficient speaker identification using distributional speaker model clustering

Apsingekar

León

2008

View full text Add to dashboard Cite

Efficient speaker verification system using speaker model clustering for T and Z normalizations

Ravulakollu

Apsingekar

León

2008

View full text Add to dashboard Cite

Abstract-In speaker verification (SV) systems based on Gaussian Mixture Model-Universal Background Model (GMM-UBM), normalization is an important component in the decision stage. Many normalization methods including the T-and Znorms, have been proposed and investigated and these have contributed to state-of-the-art SV systems which have extremely low equal-error rates (EERs). In this paper, we consider application of both T-and Z-norms to a carefully selected subset of speakers using a data driven approach which can significantly reduce computation resulting in faster SV decisions and lower EER. Unfortunately, selection of the subset is critical and must be representative of the entire speaker model space otherwise error rates will increase. In order to properly select the subset of speakers for the normalizations, we propose a novel method which first clusters the speaker models using the K-means algorithm and the Kullback-Leibler (KL) divergence and then selects a set of speakers within the cluster. We evaluate the approach using both the TIMIT, NTIMIT and NIST-2002 corpora and compare against standard T-and Z-normalizations.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.