Extracting clues from human interpreter speech for spoken language translation

Paulik, Matthias; Waibel, Alex

doi:10.1109/icassp.2008.4518805

Cited by 9 publications

(4 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One can combine speech with a text stream, usually for an application such as machine-aided human translation [1,2], in which a human translator dictates the translation, rather than typing it. Also, a few works have looked into combining several speech streams [3,4], to improve ASR and MT systems in a simultaneous or consecutive interpretation scenario.…”

Section: Relation To Previous Workmentioning

confidence: 99%

Improving ASR by integrating lecture audio and slides

Miranda

Neto

Black

2013

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

We propose a method to combine audio of a lecture with its supporting slides in order to improve automatic speech recognition performance. We view both the lecture speech and the slides as parallel streams which contain redundant information. We integrate both streams in order to bias the recognizer's language model towards the words in the slides, by first aligning the speech with the slide words, thus correcting errors on the ASR transcripts. We obtain a 5.9% relative WER improvement on a lecture test set, when compared to a speech recognition only system.

show abstract

Section: Relation To Previous Workmentioning

confidence: 99%

Improving ASR by integrating lecture audio and slides

Miranda

Neto

Black

2013

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

show abstract

“…We used ASR hypotheses as well as reference transcripts for the experiments, whereas the Spanish hypotheses were generated with a system trained within TC-STAR on Parliament plenary sessions (Stüker et al 2007;Paulik and Waibel 2008). The case-insensitive WER was 8.4%.…”

Section: Translatable Speech Segmentsmentioning

confidence: 99%

Simultaneous translation of lectures and speeches

Fügen¹,

Waibel

Kolss³

2007

Machine Translation

Self Cite

124

View full text Add to dashboard Cite

With increasing globalization, communication across language and cultural boundaries is becoming an essential requirement of doing business, delivering education, and providing public services. Due to the considerable cost of human translation services, only a small fraction of text documents and an even smaller percentage of spoken encounters, such as international meetings and conferences, are translated, with most resorting to the use of a common language (e.g. English) or not taking place at all. Technology may provide a potentially revolutionary way out if real-time, domain-independent, simultaneous speech translation can be realized. In this paper, we present a simultaneous speech translation system based on statistical recognition and translation technology. We discuss the technology, various system improvements and propose mechanisms for user-friendly delivery of the result. Over extensive component and end-to-end system evaluations and comparisons with human translation performance, we conclude that machines can already deliver comprehensible simultaneous translation output. Moreover, while machine performance is affected by recognition errors (and thus can be improved), human performance is limited by the cognitive challenge of performing the task in real time.

show abstract

“…Further, the hypotheses of both ASR systems can be tied together in a parallel training corpus suitable for TM training, as shown in [1]. Similar to previous works [2,3,4], we exploit the parallel information given in the respective other language audio stream to bias the ASR systems for an improved transcription performance. In the proposed context, such an improved ASR performance directly affects the quality of the extracted training data.…”

Section: System Architecturementioning

confidence: 99%

Spoken language translation from parallel speech audio: Simultaneous interpretation as SLT training data

Paulik

Waibel

2010

2010 IEEE International Conference on Acoustics, Speech and Signal Processing

Self Cite

View full text Add to dashboard Cite

In recent work, we proposed an alternative to parallel text as translation model (TM) training data: audio recordings of parallel speech (pSp), as it occurs in any communication scenario where interpreters are involved. Although interpretation compares poorly to translation, we reported surprisingly strong translation results for systems based on pSp trained TMs. This work extends the use of pSp as a data source for unsupervised training of all major models involved in statistical spoken language translation. We consider the scenario of speech translation between a resource rich and a resource-deficient language. Our seed models are based on 10h of transcribed audio and parallel text comprised of 100k translated words. With the help of 92h of untranscribed pSp audio, and by taking advantage of the redundancy inherent to pSp (the same information is given twice, in two languages), we report significant improvements for the resourcedeficient acoustic, language and translation models.

show abstract

Extracting clues from human interpreter speech for spoken language translation

Cited by 9 publications

References 7 publications

Improving ASR by integrating lecture audio and slides

Improving ASR by integrating lecture audio and slides

Simultaneous translation of lectures and speeches

Spoken language translation from parallel speech audio: Simultaneous interpretation as SLT training data

Contact Info

Product

Resources

About