Targeted Subset Selection for Limited-data ASR Accent Adaptation

Kothyari, Mayank; Mekala, A.; Iyer, Rishabh; Ramakrishnan, Ganesh; Jyothi, Preethi

doi:10.48550/arxiv.2110.04908

Cited by 2 publications

(2 citation statements)

References 15 publications

(19 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Devising strategies for data pruning and constructing optimal subsets is a recent topic of interest in the area of optimization and active learning (Dong et al, 2019;Kaushal et al, 2019;Saadatfar et al, 2020;Durga et al, 2021;Kothawade et al, 2021;Killamsetty et al, 2021;Paul et al, 2021;Kothyari et al, 2021;Ahia et al, 2021). A few studies have examined the training landscape for drawing clues about the optimal subset creation (Toneva et al, 2018;Agarwal et al, 2020;Baldock et al, 2021;Paul et al, 2021;Schirrmeister et al, 2022).…”

Section: Related Workmentioning

confidence: 99%

Towards Representative Subset Selection for Self-Supervised Speech Recognition

Hameed¹,

Qazi²,

Raza³

2022

Preprint

View full text Add to dashboard Cite

Self-supervised speech recognition models require considerable labeled training data for learning high-fidelity representations for Automatic Speech Recognition (ASR), which hinders their application to low-resource languages. We consider the task of identifying an optimal subset of training data to fine-tune self-supervised speech models for ASR. We make a surprising observation that active learning strategies for sampling harder-to-learn examples do not perform better than random subset selection for fine-tuning selfsupervised ASR. We then present the COWER-AGE algorithm for better subset selection in selfsupervised ASR which is based on our finding that ensuring the coverage of examples based on training WER in the early training epochs leads to better generalization performance. Extensive experiments on the wav2vec 2.0 model and TIMIT dataset show the effectiveness of COWERAGE, with up to 27% absolute WER improvement over active learning methods. We also report the connection between training WER and the phonemic cover and demonstrate that our algorithm ensures inclusion of phonemically diverse examples.

show abstract

Section: Related Workmentioning

confidence: 99%

Towards Representative Subset Selection for Self-Supervised Speech Recognition

Hameed¹,

Qazi²,

Raza³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…[20] use the submodular information measures for active learning in the image classification setting to address realistic scenarios like imbalance, redundancy, and out-of-distribution data. Finally, [21] use the submodular information measures for personalized speech recognition. To our knowledge, this is the first work which proposes an active learning framework for object detection capable of handling rare slices of data.…”

Section: 6mentioning

confidence: 99%

TALISMAN: Targeted Active Learning for Object Detection with Rare Classes and Slices using Submodular Mutual Information

Kothawade¹,

Ghosh²,

Shekhar³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Deep neural networks based object detectors have shown great success in a variety of domains like autonomous vehicles, biomedical imaging, etc., however their success depends on the availability of a large amount of data from the domain of interest. While deep models perform well in terms of overall accuracy, they often struggle in performance on rare yet critical data slices. For example, detecting objects in rare data slices like "motorcycles at night" or "bicycles at night" for self-driving applications. Active learning (AL) is a well-known paradigm to incrementally and adaptively build training datasets with a human in the loop. However, current AL based acquisition functions are not well-equipped to mine rare slices of data from large real-world datasets, since they are based on uncertainty scores or global descriptors of the image. We propose TALISMAN, a novel framework for Targeted Active Learning for object detectIon with rare slices using Submodular MutuAl iNformation. Our method uses the submodular mutual information functions instantiated using features of the region of interest (RoI) to efficiently target and acquire images with rare slices. We evaluate our framework on the standard PASCAL VOC07+12 [7] and BDD100K [32], a real-world large-scale driving dataset. We observe that TALISMAN consistently outperforms a wide range of AL methods by ≈ 5% − 14% in terms of average precision on rare slices, and ≈ 2%−4% in terms of mAP.

show abstract

Targeted Subset Selection for Limited-data ASR Accent Adaptation

Cited by 2 publications

References 15 publications

Towards Representative Subset Selection for Self-Supervised Speech Recognition

Towards Representative Subset Selection for Self-Supervised Speech Recognition

TALISMAN: Targeted Active Learning for Object Detection with Rare Classes and Slices using Submodular Mutual Information

Contact Info

Product

Resources

About