Thiago Fraga-Silva scite author profile

Laurent

et al. 2015

This paper extends recent research on training data selection for speech transcription and keyword spotting system development. Selection techniques were explored in the context of the IARPA-Babel Active Learning (AL) task for 6 languages. Different selection criteria were considered with the goal of improving over a system built using a pre-defined 3-hour training data set. Four variants of the entropy-based criterion were explored: words, triphones, phones as well as the use of HMM-states previously introduced in [4]. The influence of the number of HMM-states was assessed as well as whether automatic or manual reference transcripts were used. The combination of selection criteria was investigated, and a novel multi-stage selection method proposed. This method was also assessed using larger data sets than were permitted in the Babel AL task. Results are reported for the 6 languages. The multi-stage selection was also applied to the surprise language (Swahili) in the NIST OpenKWS 2015 evaluation.

show abstract

Lattice-based unsupervised acoustic model training

2011

Active learning based data selection for limited resource STT and KWS

et al. 2015

Lithuanian Broadcast Speech Transcription Using Semi-supervised Acoustic Model Training

Lileikytė

Gorin

Procedia Computer Science

et al. 2016

Investigating techniques for low resource conversational speech recognition

Laurent

et al. 2016

In this paper we investigate various techniques in order to build effective speech to text (STT) and keyword search (KWS) systems for low resource conversational speech. Subword decoding and graphemic mappings were assessed in order to detect out-of-vocabulary keywords. To deal with the limited amount of transcribed data, semi-supervised training and data selection methods were investigated. Robust acoustic features produced via data augmentation were evaluated for acoustic modeling. For language modeling, automatically retrieved conversational-like Webdata was used, as well as neural network based models. We report STT improvements with all the techniques, but interestingly only some improve KWS performance. Results are reported for the Swahili language in the context of the 2015 OpenKWS Evaluation.

show abstract

Infected Phonemes: How a Cold Impairs Speech on a Phonetic Level

Wagner

Josse³

et al. 2017

The realization of language through vocal sounds involves a complex interplay between the lungs, the vocal cords, and a series of resonant chambers (e.g. mouth and nasal cavities). Due to their connection to the outside world, these body parts are popular spots for viruses and bacteria to enter the human organism. Affected people may suffer from an upper respiratory tract infection (URTIC) and consequently their voice often sounds breathy, raspy or sniffly. In this paper, we investigate the audible effects of a cold on a phonetic level. Results on a German corpus show that the articulation of consonants is more impaired than that of vowels. Surprisingly, nasal sounds do not follow this trend in our experiments. We finally try to predict a speaker's health condition by fusing decisions we derive from single phonemes. The presented work is part of the INTER-

show abstract

Effective keyword search for low-resourced conversational speech

Lileikytė

et al. 2017

Interpolation of acoustic models for speech recognition

2013