2019
DOI: 10.48550/arxiv.1903.08678
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Probing the Need for Visual Context in Multimodal Machine Translation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

2
14
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(16 citation statements)
references
References 0 publications
2
14
0
Order By: Relevance
“…Most of the existing works are not capable of generating multi-modal summaries 11 . The systems that do generate multi-modal summaries either have an inbuilt system capable to generating multimodal output (mainly by generating text using seq2seq mechanisms and selecting relevant images) [61,134] or they adopt some post-processing steps to obtain the visual and vocal supplements of the generated textual summaries [44,133].…”
Section: Post-processingmentioning
confidence: 99%
See 1 more Smart Citation
“…Most of the existing works are not capable of generating multi-modal summaries 11 . The systems that do generate multi-modal summaries either have an inbuilt system capable to generating multimodal output (mainly by generating text using seq2seq mechanisms and selecting relevant images) [61,134] or they adopt some post-processing steps to obtain the visual and vocal supplements of the generated textual summaries [44,133].…”
Section: Post-processingmentioning
confidence: 99%
“…Information in the form of multi-modal inputs has been leveraged in many tasks other than summarization including multi-modal machine translation [11,21,22,39,108], multi-modal movement prediction [18,53,120], product classification in e-commerce [128], multi-modal interactive artificial intelligence frameworks [51], multi-modal emoji prediction [5,17], multi-modal frame identification [10], multi-modal financial risk forecasting [59,101], multi-modal sentiment analysis [79,93,122], multi-modal named identity recognition [2,77,78,109,126,130], multi-modal video description generation [37,38,91], multi-modal product title compression [70] and multi-modal biometric authentication [28,42,106]. The shear number of application possibilities for multi-modal information processing and retrieval tasks are quite impressive.…”
Section: Introductionmentioning
confidence: 99%
“…While much of work in Multimodal Machine Translation (MMT) has suggested that the visual modality is at best marginally beneficial (Barrault et al, 2018;Elliott, 2018), recent work (Caglayan et al, 2019a) suggests that visual information is useful when there is missing information in the source-side signal. We hypothesize that the same could hold true for Multimodal ASR, under conditions when the acoustic speech is corrupted.…”
Section: Introductionmentioning
confidence: 99%
“…Inspired by Caglayan et al (2019a), in this work we port a similar set of experiments to MMASR, where we analyze the contribution of the visual modality to different input signal corruption in the primary modality (i.e. acoustic signal) on state-of-the-art MMASR architectures (Sanabria et al, 2018;Caglayan et al, 2019b).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation