2020
DOI: 10.1007/s10590-020-09250-0
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal machine translation through visuals and speech

Abstract: Multimodal machine translation involves drawing information from more than one modality, based on the assumption that the additional modalities will contain useful alternative views of the input data. The most prominent tasks in this area are spoken language translation, image-guided translation, and video-guided translation, which exploit audio and visual modalities, respectively. These tasks are distinguished from their monolingual counterparts of speech recognition, image captioning, and video captioning by… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
29
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3
2

Relationship

4
6

Authors

Journals

citations
Cited by 48 publications
(37 citation statements)
references
References 150 publications
0
29
0
Order By: Relevance
“…MMT aims to improve the quality of automatic translation using auxiliary sources of information (Sulubacak et al, 2020). The most typical framework explored in previous work makes use of the images when translating their descriptions between languages, with the hypothesis that visual grounding could provide contextual cues to resolve linguistic phenomena such as word-sense disambiguation or gender marking.…”
Section: Multimodal Machine Translation (Mmt)mentioning
confidence: 99%
“…MMT aims to improve the quality of automatic translation using auxiliary sources of information (Sulubacak et al, 2020). The most typical framework explored in previous work makes use of the images when translating their descriptions between languages, with the hypothesis that visual grounding could provide contextual cues to resolve linguistic phenomena such as word-sense disambiguation or gender marking.…”
Section: Multimodal Machine Translation (Mmt)mentioning
confidence: 99%
“…For a good comparison of empirical results, which are not the focus of this paper, we refer to concurrent work(Sulubacak et al, 2019). Moreover, for conciseness we do not cover the sub-topic of simultaneous translation(Fügen, 2008).…”
mentioning
confidence: 99%
“…Inspired by studies of human perception, multimodal processing is spreading into many traditional areas of research, e.g., machine translation (Sulubacak et al, 2019) and ASR . It has become an important part of new areas of research such as image captioning (Bernardi et al, 2016), visual question-answering (VQA; (Antol et al, 2015)), and multimodal summarization .…”
Section: Related Workmentioning
confidence: 99%