Groupwise learning for ASR k-best list reranking in spoken language translation

Ng, Raymond W. M.; Shah, Kashif; Specia, Lucia; Hain, Thomas

doi:10.1109/icassp.2016.7472853

Cited by 7 publications

(3 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Making complex ASR systems available was originally the intent of webASR, and, as such, ASR remains the main task in 3 newly developed systems covering 3 domains. All of them present a state-of-the-art speech transcription system, based on the latest research carried out at the University of Sheffield in topics such as Deep Neural Network (DNN) acoustic modelling [11,12,13,14], distant microphone recognition [15], adaptation to noisy environments [16,17,18], domain adaptation [19,20], Recurrent Neural Network (RNN) language modelling [21], Nbest re-ranking [22,23] and sentence-end detection [24,25].…”

Section: Transcription Systemsmentioning

confidence: 99%

webASR 2 — Improved Cloud Based Speech Technology

Hain

Christian²,

Saz

et al. 2016

Interspeech 2016

Self Cite

View full text Add to dashboard Cite

This paper presents the most recent developments of the webASR service (www.webasr.org), the world's first webbased fully functioning automatic speech recognition platform for scientific use. Initially released in 2008, the functionalities of webASR have recently been expanded with 3 main goals in mind: Facilitate access through a RESTful architecture, that allows for easy use through either the web interface or an API; allow the use of input metadata when available by the user to improve system performance; and increase the coverage of available systems beyond speech recognition. Several new systems for transcription, diarisation, lightly supervised alignment and translation are currently available through webASR. The results in a series of well-known benchmarks (RT'09, IWSLT'12 and MGB'15 evaluations) show how these webASR systems provides state-of-the-art performances across these tasks.

show abstract

Section: Transcription Systemsmentioning

confidence: 99%

webASR 2 — Improved Cloud Based Speech Technology

Hain

Christian²,

Saz

et al. 2016

Interspeech 2016

Self Cite

View full text Add to dashboard Cite

show abstract

“…In the cascade approach, an ASR system transcribes the input speech signal, and this is fed to a downstream MT system that carries out the translation. The provided input to the MT step can be the 1-best hypothesis, but also n-best lists (Ng et al, 2016) or even lattices (Matusov and Ney, 2011;Sperber et al, 2019). Additional techniques can also be used to improve the performance of the pipeline by better adapting the MT system to the expected input, such as training with transcribed text (Peitz et al, 2012) or chunking (Sperber et al, 2017).…”

Section: Introductionmentioning

confidence: 99%

Direct Segmentation Models for Streaming Speech Translation

Iranzo-Sánchez¹,

Pastor²,

Silvestre-Cerdà³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

The cascade approach to Speech Translation (ST) is based on a pipeline that concatenates an Automatic Speech Recognition (ASR) system followed by a Machine Translation (MT) system. These systems are usually connected by a segmenter that splits the ASR output into, hopefully, semantically self-contained chunks to be fed into the MT system. This is specially challenging in the case of streaming ST, where latency requirements must also be taken into account. This work proposes novel segmentation models for streaming ST that incorporate not only textual, but also acoustic information to decide when the ASR output is split into a chunk. An extensive and thorough experimental setup is carried out on the Europarl-ST dataset to prove the contribution of acoustic information to the performance of the segmentation model in terms of BLEU score in a streaming ST scenario. Finally, comparative results with previous work also show the superiority of the segmentation models proposed in this work.

show abstract

“…The provided input to the MT step can be the 1-best hypothesis, but also n-best lists (Ng et al 2016) or even lattices (Matusov and Ney 2011;Sperber, Neubig, et al 2019). Additional techniques can also be used to improve the performance of the pipeline by better adapting the MT system to the expected input, such as training with transcribed text (Peitz et al 2012) or chunking (Sperber, Jan Niehues, and Waibel 2017).…”

Section: Discussionmentioning

confidence: 99%

Streaming Neural Speech Translation

Iranzo Sánchez

View full text Add to dashboard Cite

Current research into spoken language translation (SLT), or speechto-text translation, is often hampered by the lack of specific data resources for this task, as currently available SLT datasets are restricted to a limited set of language pairs. In this paper we present Europarl-ST, a novel multilingual SLT corpus containing paired audiotext samples for SLT from and into 6 European languages, for a total of 30 different translation directions. This corpus has been compiled using the debates held in the European Parliament in the period between 2008 and 2012. This paper describes the corpus creation process and presents a series of automatic speech recognition, machine translation and spoken language translation experiments that highlight the potential of this new resource. The corpus is released under a Creative Commons license and is freely accessible and downloadable.

show abstract

Groupwise learning for ASR k-best list reranking in spoken language translation

Cited by 7 publications

References 24 publications

webASR 2 — Improved Cloud Based Speech Technology

webASR 2 — Improved Cloud Based Speech Technology

Direct Segmentation Models for Streaming Speech Translation

Streaming Neural Speech Translation

Contact Info

Product

Resources

About