Exploring the use of acoustic embeddings in neural machine translation

Deena, Salil; Ng, Raymond W. M.; Madhyastha, Pranava; Specia, Lucia; Hain, Thomas

doi:10.1109/asru.2017.8268971

Cited by 8 publications

(4 citation statements)

References 18 publications

(20 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Often, the prior or future context from video, audio, or other subtitle instances is necessary to fill these contextual gaps. Sentence-level APE cannot address these issues robustly, which calls for further research on multimodal (Deena et al, 2017;Caglayan et al, 2019) anddocument-level (Hardmeier et al, 2015;Voita et al, 2019) translation and post-editing, especially for subtitles.…”

Section: Qualitative Analysismentioning

confidence: 99%

Can Automatic Post-Editing Improve NMT?

Chollampatt

Susanto

Tan

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Automatic post-editing (APE) aims to improve machine translations, thereby reducing human post-editing effort. APE has had notable success when used with statistical machine translation (SMT) systems but has not been as successful over neural machine translation (NMT) systems. This has raised questions on the relevance of APE task in the current scenario. However, the training of APE models has been heavily reliant on large-scale artificial corpora combined with only limited human post-edited data. We hypothesize that APE models have been underperforming in improving NMT translations due to the lack of adequate supervision. To ascertain our hypothesis, we compile a larger corpus of human post-edits of English to German NMT. We empirically show that a state-of-art neural APE model trained on this corpus can significantly improve a strong in-domain NMT system, challenging the current understanding in the field. We further investigate the effects of varying training data sizes, using artificial training data, and domain specificity for the APE task. We release this new corpus under CC BY-NC-SA 4.0 license at https:// github.com/shamilcm/pedra.

show abstract

Section: Qualitative Analysismentioning

confidence: 99%

Can Automatic Post-Editing Improve NMT?

Chollampatt

Susanto

Tan

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

show abstract

“…We build our systems on three speech translation corpora: Fisher-CallHome Spanish, Librispeech, and Speech-Translation TED (ST-TED) corpus. To the best of our knowledge, these are the only public available corpora recorded with a reasonable size of real speech data 6 . The data statistics are summarized in Table 1.…”

Section: Datamentioning

confidence: 99%

“…Recently, end-to-end speech translation (E2E-ST) with a sequence-to-sequence model has attracted attention for its extremely simplified architecture without complicated pipeline systems [3,4,5]. By directly translating speech signals in a source language to text in a target language, the model is able to avoid error propagation from the ASR module, and also leverages acoustic clues in the source language, which have shown to be useful for translation [6]. Moreover, it is more memory-and computationally efficient since complicated decoding for the ASR module and the latency occurring between ASR and MT modules can be bypassed.…”

Section: Introductionmentioning

confidence: 99%

Multilingual End-to-End Speech Translation

Inaguma¹,

Duh²,

Kawahara³

et al. 2019

Preprint

View full text Add to dashboard Cite

In this paper, we propose a simple yet effective framework for multilingual end-to-end speech translation (ST), in which speech utterances in source languages are directly translated to the desired target languages with a universal sequence-to-sequence architecture. While multilingual models have shown to be useful for automatic speech recognition (ASR) and machine translation (MT), this is the first time they are applied to the end-to-end ST problem. We show the effectiveness of multilingual end-to-end ST in two scenarios: one-to-many and many-to-many translations with publicly available data. We experimentally confirm that multilingual end-to-end ST models significantly outperform bilingual ones in both scenarios. The generalization of multilingual training is also evaluated in a transfer learning scenario to a very low-resource language pair. All of our codes and the database are publicly available to encourage further research in this emergent multilingual ST topic 1 .

show abstract

“…One key attribute of embedding methods is that word embedding models take into account context information of words, thereby allowing a more compact and manageable representation for words [3,4]. The embeddings are widely applied in many downstream NLP tasks such as neural machine translation, dialogue system or text summarisation [5,6,7], as well as in language modelling for speech recognition [8].…”

Section: Introductionmentioning

confidence: 99%

Contextual Joint Factor Acoustic Embeddings

Shi

Hain

2021

2021 IEEE Spoken Language Technology Workshop (SLT)

Self Cite

View full text Add to dashboard Cite

Embedding acoustic information into fixed length representations is of interest for a whole range of applications in speech and audio technology. We propose two novel unsupervised approaches to generate acoustic embeddings by modelling of acoustic context. The first approach is a contextual joint factor synthesis encoder, where the encoder in an encoder/decoder framework is trained to extract joint factors from surrounding audio frames to best generate the target output. The second approach is a contextual joint factor analysis encoder, where the encoder is trained to analyse joint factors from the source signal that correlates best with the neighbouring audio. To evaluate the effectiveness of our approaches compared to prior work, we chose two tasks -phone classification and speaker recognition -and test on different TIMIT data sets. Experimental results show that one of our proposed approaches outperforms phone classification baselines, yielding a classification accuracy of 74.1%. When using additional out-of-domain data for training, an additional 2-3% improvements can be obtained, for both for phone classification and speaker recognition tasks.

show abstract

Exploring the use of acoustic embeddings in neural machine translation

Cited by 8 publications

References 18 publications

Can Automatic Post-Editing Improve NMT?

Can Automatic Post-Editing Improve NMT?

Multilingual End-to-End Speech Translation

Contextual Joint Factor Acoustic Embeddings

Contact Info

Product

Resources

About