Vatsal Aggarwal scite author profile

Vatsal Aggarwal

5Publications

94Citation Statements Received

73Citation Statements Given

How they've been cited

154

How they cite others

Affiliations

Amazon (United Kingdom), Amazon (United States), Manipal Academy of Higher Education

Publications

Order By: Most citations

Towards Achieving Robust Universal Neural Vocoding

Lorenzo-Trueba¹,

Drugman²,

Latorre³

et al. 2019

View full text Add to dashboard Cite

This paper explores the potential universality of neural vocoders. We train a WaveRNN-based vocoder on 74 speakers coming from 17 languages. This vocoder is shown to be capable of generating speech of consistently good quality (98% relative mean MUSHRA when compared to natural speech) regardless of whether the input spectrogram comes from a speaker or style seen during training or from an out-of-domain scenario when the recording conditions are studio-quality. When the recordings show significant changes in quality, or when moving towards non-speech vocalizations or singing, the vocoder still significantly outperforms speaker-dependent vocoders, but operates at a lower average relative MUSHRA of 75%. These results are shown to be consistent across languages, regardless of them being seen during training (e.g. English or Japanese) or unseen (e.g. Wolof, Swahili, Ahmaric).

show abstract

Using Vaes and Normalizing Flows for One-Shot Text-To-Speech Synthesis of Expressive Speech

Aggarwal

Cotescu

Prateek

et al. 2020

View full text Add to dashboard Cite

We propose a Text-to-Speech method to create an unseen expressive style using one utterance of expressive speech of around one second. Specifically, we enhance the disentanglement capabilities of a state-of-the-art sequence-to-sequence based system with a Variational AutoEncoder (VAE) and a Householder Flow. The proposed system provides a 22% KLdivergence reduction while jointly improving perceptual metrics over state-of-the-art. At synthesis time we use one example of expressive style as a reference input to the encoder for generating any text in the desired style. Perceptual MUSHRA evaluations show that we can create a voice with a 9% relative naturalness improvement over standard Neural Text-to-Speech, while also improving the perceived emotional intensity (59 compared to the 55 of neutral speech).

show abstract

Towards achieving robust universal neural vocoding

Lorenzo-Trueba¹,

Drugman²,

Latorre³

et al. 2018

Preprint

View full text Add to dashboard Cite

BOFFIN TTS: Few-Shot Speaker Adaptation by Bayesian Optimization

Moss

Aggarwal

Prateek

et al. 2020

View full text Add to dashboard Cite

We present BOFFIN TTS (Bayesian Optimization For FInetuning Neural Text To Speech), a novel approach for few-shot speaker adaptation. Here, the task is to fine-tune a pre-trained TTS model to mimic a new speaker using a small corpus of target utterances. We demonstrate that there does not exist a one-size-fits-all adaptation strategy, with convincing synthesis requiring a corpus-specific configuration of the hyperparameters that control fine-tuning. By using Bayesian optimization to efficiently optimize these hyper-parameter values for a target speaker, we are able to perform adaptation with an average 30% improvement in speaker similarity over standard techniques. Results indicate, across multiple corpora, that BOFFIN TTS can learn to synthesize new speakers using less than ten minutes of audio, achieving the same naturalness as produced for the speakers used to train the base model.

show abstract

Melanoma Detection in Dermoscopic Images using Color Features

Pathan

Aggarwal

Prabhu

et al. 2019

Biomed. Pharmacol. J.

View full text Add to dashboard Cite

Color is considered to be a major characteristic feature that is used for distinguishing benign and malignant melanocytic lesions. Most of malignant melanomas are characterized by the presence of six suspicious colors inspired from the ABCD dermoscopic rule. The presence of these suspicious colors histopathologically indicates the presence of melanin in the deeper layers of the epidermis and dermis. The objective of the proposed work is to evaluate the role of color features, a set of fifteen color features have been extracted from the region of interest to determine the role of color in malignancy detection. Further, a set of ensemble classifiers with dynamic selection techniques are used for classification of the extracted features, yielding an average accuracy of 87.5% for classifying benign and malignant lesions.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Vatsal Aggarwal

Towards Achieving Robust Universal Neural Vocoding

Using Vaes and Normalizing Flows for One-Shot Text-To-Speech Synthesis of Expressive Speech

Towards achieving robust universal neural vocoding

BOFFIN TTS: Few-Shot Speaker Adaptation by Bayesian Optimization

Melanoma Detection in Dermoscopic Images using Color Features

Contact Info

Product

Resources

About