Recovery of acronyms, out-of-lattice words and pronunciations from parallel multilingual speech

Miranda, J. M.; Neto, João Paulo; Black, Alan W.

doi:10.1109/slt.2012.6424248

Cited by 3 publications

(2 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Additionally, word lattice based approaches have also been pursued [2,3]. The transcription of multiple streams of interpreted speech has also been addressed with the aid of machine translation [16,17]. However, in all of these works the translation models are trained on substantial external written corpora such as European parliament proceedings or the Canadian Hansards.…”

Section: Related Workmentioning

confidence: 99%

Learning a Translation Model from Word Lattices

et al. 2016

View full text Add to dashboard Cite

Translation models have been used to improve automatic speech recognition when speech input is paired with a written translation, primarily for the task of computer-aided translation. Existing approaches require large amounts of parallel text for training the translation models, but for many language pairs this data is not available. We propose a model for learning lexical translation parameters directly from the word lattices for which a transcription is sought. The model is expressed through composition of each lattice with a weighted finite-state transducer representing the translation model, where inference is performed by sampling paths through the composed finitestate transducer. We show consistent word error rate reductions in two datasets, using between just 20 minutes and 4 hours of speech input, additionally outperforming a translation model trained on the 1-best path.

show abstract

Section: Related Workmentioning

confidence: 99%

Learning a Translation Model from Word Lattices

et al. 2016

View full text Add to dashboard Cite

show abstract

“…This process is described in Section 3. In [6], we also recover words that are not in the lattices produced by the recognizer, acronyms and pronunciations, using the redundancy provided by multiple streams. The current paper extends these works by integrating a new type of stream, which consists of slides, rather than speech, in the existing framework.…”

Section: Relation To Previous Workmentioning

confidence: 99%

Improving ASR by integrating lecture audio and slides

Miranda

Neto

Black

2013

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

Self Cite

View full text Add to dashboard Cite

We propose a method to combine audio of a lecture with its supporting slides in order to improve automatic speech recognition performance. We view both the lecture speech and the slides as parallel streams which contain redundant information. We integrate both streams in order to bias the recognizer's language model towards the words in the slides, by first aligning the speech with the slide words, thus correcting errors on the ASR transcripts. We obtain a 5.9% relative WER improvement on a lecture test set, when compared to a speech recognition only system.

show abstract