Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers) 2019
DOI: 10.18653/v1/w19-5209
|View full text |Cite
|
Sign up to set email alerts
|

Customizing Neural Machine Translation for Subtitling

Abstract: In this work, we customized a neural machine translation system for translation of subtitles in the domain of entertainment. The neural translation model was adapted to the subtitling content and style and extended by a simple, yet effective technique for utilizing intersentence context for short sentences such as dialog turns. The main contribution of the paper is a novel subtitle segmentation algorithm that predicts the end of a subtitle line given the previous word-level context using a recurrent neural net… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
20
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 29 publications
(28 citation statements)
references
References 19 publications
1
20
0
Order By: Relevance
“…Recently, after the advent of the neural machine translation paradigm, Matusov et al (2019) presented an NMT system customised to subtitling. The main contribution of the paper is a segmenter module trained on human segmentation decisions, which splits the resulting translation into subtitles.…”
Section: Machine Translation For Subtitlingmentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, after the advent of the neural machine translation paradigm, Matusov et al (2019) presented an NMT system customised to subtitling. The main contribution of the paper is a segmenter module trained on human segmentation decisions, which splits the resulting translation into subtitles.…”
Section: Machine Translation For Subtitlingmentioning
confidence: 99%
“…These source language subtitles (also called captions) are already compressed and segmented to respect the subtitling constraints of length, reading speed and proper segmentation (Cintas and Remael, 2007;Karakanta et al, 2019). In this way, the work of an NMT system is already simplified, since it only needs to translate matching the length of the source text (Matusov et al, 2019;Lakew et al, 2019). However, the essence of a good subtitle goes beyond matching a predetermined length (as, for instance, 42 characters per line in the case of TED talks).…”
Section: Introductionmentioning
confidence: 99%
“…This has been shown in Karakanta, Negri, and Turchi (2020a), where training on the MuST-Cinema data, annotated with the special symbols corresponding to subtitle and line breaks, achieved high conformity to the subtitling constraints, without any modification in the NMT architecture. Similarly, the segmentation module presented in (Matusov, Wilken, and Georgakopoulou 2019), which is trained on the monolingual OpenSubtitles data (which contain breaks as metadata), led to significant reductions in post-editing effort.…”
Section: Figurementioning
confidence: 99%
“…One way to automatise this labour-intensive process, especially in settings where several target languages are involved, is creating a subtitle template of the source language (Georgakopoulou, 2019). A subtitle template is an enriched transcript of the source language speech where the text is already compressed, timed and segmented into proper subtitles.…”
Section: Subtitlingmentioning
confidence: 99%
“…content, still rely heavily on human effort. In a typical multilingual subtitling workflow, a subtitler first creates a subtitle template (Georgakopoulou, 2019) by transcribing the source language audio, timing and adapting the text to create proper subtitles in the source language. These source language subtitles (also called captions) are already compressed and segmented to respect the subtitling constraints of length, reading speed and proper segmentation (Cintas and Remael, 2007;Karakanta et al, 2019).…”
Section: Introductionmentioning
confidence: 99%