Improving data selection for low-resource STT and KWS

Fraga-Silva, Thiago; Laurent, Antoine; Gauvain, Jean‐Luc; Lamel, Lori; Le, Viet-Bac; Messaoudi, Abdel

doi:10.1109/asru.2015.7404788

Cited by 9 publications

(14 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This method was used in [6] together with an N-best entropy based data selection. Finally, the study in [7] found that HMM-state entropy and letter density are good indicators of the utterance informativeness. Encouraging results were reported from the early attempts [2,3] with a 60% reduction of the transcription cost over Random Selection (RS).…”

Section: Introductionmentioning

confidence: 95%

“…In this paper, we focus on conventional confidence-based AL as suggested in [2], although other studies [3,6,7] have shown some improvement over it. It is however worth highlighting that the details of the baseline confidence-based approach were not always clearly described, and that subsequent results were not in line with those reported in [2].…”

Section: Introductionmentioning

confidence: 99%

“…In most of existing AL and SST studies (e.g. [2,3,6,7,13,14]), the Word Error Rate (WER) typically ranges between 25 and 75%. The baseline model in the present work has a WER of about 12.5%, which makes the application of AL and SST on an industrial task even more challenging.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Active and Semi-Supervised Learning in ASR: Benefits on the Acoustic and Language Models

2016

View full text Add to dashboard Cite

The goal of this paper is to simulate the benefits of jointly applying active learning (AL) and semi-supervised training (SST) in a new speech recognition application. Our data selection approach relies on confidence filtering, and its impact on both the acoustic and language models (AM and LM) is studied. While AL is known to be beneficial to AM training, we show that it also carries out substantial improvements to the LM when combined with SST. Sophisticated confidence models, on the other hand, did not prove to yield any data selection gain. Our results indicate that, while SST is crucial at the beginning of the labeling process, its gains degrade rapidly as AL is set in place. The final simulation reports that AL allows a transcription cost reduction of about 70% over random selection. Alternatively, for a fixed transcription budget, the proposed approach improves the word error rate by about 12.5% relative.

show abstract

Section: Introductionmentioning

confidence: 95%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Active and Semi-Supervised Learning in ASR: Benefits on the Acoustic and Language Models

2016

View full text Add to dashboard Cite

show abstract

“…In many of the previous works [21,22], the use of untranscribed data from the same language to improve the performance of the acoustic model in a low-resource language were studied. However, the use of transcribed data from closely related languages were not studied in detail.…”

Section: Borrowing Data or Pooling Datamentioning

confidence: 99%

Overcoming Data Sparsity in Acoustic Modeling of Low-Resource Language by Borrowing Data and Model Parameters from High-Resource Languages

2016

View full text Add to dashboard Cite

In this paper, we propose two techniques to improve the acoustic model of a low-resource language by: (i) Pooling data from closely related languages using a phoneme mapping algorithm to build acoustic models like subspace Gaussian mixture model (SGMM), phone cluster adaptive training (Phone-CAT), deep neural network (DNN) and convolutional neural network (CNN). Using the low-resource language data, we then adapt the afore mentioned models towards that language. (ii) Using models built from high-resource languages, we first borrow subspace model parameters from SGMM/Phone-CAT; or hidden layers from DNN/CNN. The language specific parameters are then estimated using the lowresource language data. The experiments were performed on four Indian languages namely Assamese, Bengali, Hindi and Tamil. Relative improvements of 10 to 30% were obtained over corresponding monolingual models in each case.

show abstract

“…In this case, the automatic transcripts are directly used for acoustic model training. 2) Data selection is also used to get relevant training data [4,5]. In contrast to SST, here the goal is to select data for which accurate manual transcripts will be created.…”

Section: Introductionmentioning

confidence: 99%

Investigating techniques for low resource conversational speech recognition

Laurent

Fraga-Silva

Lamel

et al. 2016

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

In this paper we investigate various techniques in order to build effective speech to text (STT) and keyword search (KWS) systems for low resource conversational speech. Subword decoding and graphemic mappings were assessed in order to detect out-of-vocabulary keywords. To deal with the limited amount of transcribed data, semi-supervised training and data selection methods were investigated. Robust acoustic features produced via data augmentation were evaluated for acoustic modeling. For language modeling, automatically retrieved conversational-like Webdata was used, as well as neural network based models. We report STT improvements with all the techniques, but interestingly only some improve KWS performance. Results are reported for the Swahili language in the context of the 2015 OpenKWS Evaluation.

show abstract

Improving data selection for low-resource STT and KWS

Cited by 9 publications

References 22 publications

Active and Semi-Supervised Learning in ASR: Benefits on the Acoustic and Language Models

Active and Semi-Supervised Learning in ASR: Benefits on the Acoustic and Language Models

Overcoming Data Sparsity in Acoustic Modeling of Low-Resource Language by Borrowing Data and Model Parameters from High-Resource Languages

Investigating techniques for low resource conversational speech recognition

Contact Info

Product

Resources

About