“…Information in the form of multi-modal inputs has been leveraged in many tasks other than summarization including multi-modal machine translation [11,21,22,39,108], multi-modal movement prediction [18,53,120], product classification in e-commerce [128], multi-modal interactive artificial intelligence frameworks [51], multi-modal emoji prediction [5,17], multi-modal frame identification [10], multi-modal financial risk forecasting [59,101], multi-modal sentiment analysis [79,93,122], multi-modal named identity recognition [2,77,78,109,126,130], multi-modal video description generation [37,38,91], multi-modal product title compression [70] and multi-modal biometric authentication [28,42,106]. The shear number of application possibilities for multi-modal information processing and retrieval tasks are quite impressive.…”