Samik Sadhu scite author profile

Samik Sadhu

5Publications

33Citation Statements Received

29Citation Statements Given

How they've been cited

How they cite others

Affiliations

Johns Hopkins University, Indian Institute of Science Bangalore

Publications

Order By: Most citations

wav2vec-C: A Self-Supervised Model for Speech Representation Learning

Sadhu

Huang

et al. 2021

View full text Add to dashboard Cite

Wav2vec-C introduces a novel representation learning technique combining elements from wav2vec 2.0 and VQ-VAE. Our model learns to reproduce quantized representations from partially masked speech encoding using a contrastive loss in a way similar to Wav2vec 2.0. However, the quantization process is regularized by an additional consistency network that learns to reconstruct the input features to the wav2vec 2.0 network from the quantized representations in a way similar to a VQ-VAE model. The proposed self-supervised model is trained on 10k hours of unlabeled data and subsequently used as the speech encoder in a RNN-T ASR model and fine-tuned with 1k hours of labeled data. This work is one of only a few studies of selfsupervised learning on speech tasks with a large volume of real far-field labeled data. The Wav2vec-C encoded representations achieves, on average, twice the error reduction over baseline and a higher codebook utilization in comparison to wav2vec 2.0.

show abstract

Continual Learning in Automatic Speech Recognition

Sadhu

Heřmanský

2020

View full text Add to dashboard Cite

M-vectors: Sub-band Based Energy Modulation Features for Multi-stream Automatic Speech Recognition

Sadhu

Heřmanský

2019

View full text Add to dashboard Cite

Low resource point process models for keyword spotting using unsupervised online learning

Sadhu

Ghosh

2017

View full text Add to dashboard Cite

Abstract-Point Process Models (PPM) have been widely used for keyword spotting applications. Training these models typically requires a considerable number of keyword examples. In this work, we consider a scenario where very few keyword examples are available for training. The availability of a limited number of training examples results in a PPM with poorly learnt parameters. We propose an unsupervised online learning algorithm that starts from a poor PPM model and updates the PPM parameters using newly detected samples of the keyword in a corpus under consideration and uses the updated model for further keyword detection. We test our algorithm on eight keywords taken from the TIMIT database, the training set of which, on average, has 469 samples of each keyword. With an initial set of only five samples of a keyword (corresponds to ∼ 1% of the total number of samples) followed by the proposed online parameter updating throughout the entire TIMIT train set, the performance on the TIMIT test set using the final model is found to be comparable to that of a PPM trained with all the samples of the respective keyword available from the entire TIMIT train set.

show abstract

Wav2vec-C: A Self-supervised Model for Speech Representation Learning

Sadhu¹,

He²,

Huang³

et al. 2021

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Samik Sadhu

wav2vec-C: A Self-Supervised Model for Speech Representation Learning

Continual Learning in Automatic Speech Recognition

M-vectors: Sub-band Based Energy Modulation Features for Multi-stream Automatic Speech Recognition

Low resource point process models for keyword spotting using unsupervised online learning

Wav2vec-C: A Self-supervised Model for Speech Representation Learning

Contact Info

Product

Resources

About