Optimizing sentence segmentation for spoken language translation

Rao, Sharath; Lane, Ian; Schultz, Tanja

doi:10.21437/interspeech.2007-655

Cited by 10 publications

(4 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While following a cascaded approach, one can not directly chain modules such as ASR, MT, and TTS as it is a well known fact that spoken language has various idiosyncrasies. These include lack of well-formed sentences and disfluencies (Rao et al, 2007). Traditional machine translation systems are trained on well formed, written, and grammatical pairs of sentences.…”

Section: Approachesmentioning

confidence: 99%

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

2023

View full text Add to dashboard Cite

Open-retrieval question answering systems are generally trained and tested on large datasets in well-established domains. However, lowresource settings such as new and emerging domains would especially benefit from reliable question answering systems. Furthermore, multilingual and cross-lingual resources in emergent domains are scarce, leading to few or no such systems. In this paper, we demonstrate a cross-lingual open-retrieval question answering system for the emergent domain of COVID-19. Our system adopts a corpus of scientific articles to ensure that retrieved documents are reliable. To address the scarcity of cross-lingual training data in emergent domains, we present a method utilizing automatic translation, alignment, and filtering to produce English-to-all datasets. We show that a deep semantic retriever greatly benefits from training on our English-to-all data and significantly outperforms a BM25 baseline in the cross-lingual setting. We illustrate the capabilities of our system with examples and release all code necessary to train and deploy such a system 1 .

show abstract

Section: Approachesmentioning

confidence: 99%

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

2023

View full text Add to dashboard Cite

show abstract

Section: Approachesmentioning

confidence: 99%

Towards Speech to Speech Machine Translation focusing on Indian Languages

Mujadia,

Sharma

2023

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrati

View full text Add to dashboard Cite

We introduce an SSMT (Speech to Speech Machine Translation, aka Speech to Speech Video Translation) Pipeline 1 , as a web application for translating videos from one language to another by cascading multiple language modules. Our speech translation system combines highly accurate speech to text (ASR) for Indian English, pre-possessing modules to bridge ASR-MT gaps such as spoken disfluency and punctuation, robust machine translation (MT) systems for multiple language pairs, SRT module for translated text, text to speech (TTS) module and a module to render translated synthesized audio on the original video. It is user-friendly, flexible, and easily accessible system. We aim to provide a complete configurable speech translation experience to users and researchers with this system. It also supports human intervention where users can edit outputs of different modules and the edited output can then be used for subsequent processing to improve overall output quality. By adopting a human-in-theloop approach, the aim is to configure technology in such a way where it can assist humans and help to reduce the involved human efforts in speech translation involving English and Indian languages. As per our understanding, this is the first fully integrated system for English to Indian languages (Hindi, Telugu, Gujarati, Marathi, and Punjabi) video translation. Our evaluation shows that one can get 3.5+ MOS score using the developed pipeline with human intervention for English to Hindi. A short video demonstrating our system is available at https://youtu.be/MVftzoeRg48.

show abstract

“…Segmentation in SLT has been studied quite extensively in high-resource settings. Early work used kernel-based SVM models to predict sentence boundaries using language model probabilities along with prosodic features such as pause duration (Matusov et al, 2007;Rao et al, 2007) and part-of-speech features derived from a fixed window size (Rangarajan Sridhar et al, 2013). Other work has modeled the problem using hidden markov models (Shriberg et al, 2000;Gotoh and Renals, 2000;Christensen et al, 2001;Kim and Woodland, 2001) and conditional random fields (Liu et al, 2005;Lu and Ng, 2010).…”

Section: Related Workmentioning

confidence: 99%

“…While prior work has trained intermediate components to segment ASR output into sentence-like units (Matusov et al, 2007;Rao et al, 2007), these have primarily focused on highly resourced language pairs such as Arabic and Chinese. When the source language is low-resource, suitable training data may be very limited for ASR and MT, and even nonexistent for segmentation.…”

Section: Introductionmentioning

confidence: 99%

Segmenting Subtitles for Correcting ASR Segmentation Errors

Wan¹,

Kedzie²,

Ladhak³

et al. 2021

Preprint

View full text Add to dashboard Cite

Typical ASR systems segment the input audio into utterances using purely acoustic information, which may not resemble the sentencelike units that are expected by conventional machine translation (MT) systems for Spoken Language Translation. In this work, we propose a model for correcting the acoustic segmentation of ASR models for low-resource languages to improve performance on downstream tasks. We propose the use of subtitles as a proxy dataset for correcting ASR acoustic segmentation, creating synthetic acoustic utterances by modeling common error modes. We train a neural tagging model for correcting ASR acoustic segmentation and show that it improves downstream performance on MT and audio-document cross-language information retrieval (CLIR).

show abstract

Optimizing sentence segmentation for spoken language translation

Cited by 10 publications

References 8 publications

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

Towards Speech to Speech Machine Translation focusing on Indian Languages

Segmenting Subtitles for Correcting ASR Segmentation Errors

Contact Info

Product

Resources

About