Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis

Liu, Yan; Yu, Jingjing; Xia, Rui

doi:10.48550/arxiv.2204.07955

Cited by 1 publication

(3 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…• Textual Aspect-Opinion Extraction (AOE) aims to extract aspect and opinion terms from the text, as noted in [127]. To handle the lack of label information required for supervised learning, the authors resort to other models for aspect extraction and opinion extraction.…”

Section: Pre-training Objectivesmentioning

confidence: 99%

“…• Visual Aspect-Opinion Generation (AOG) targets at generating the aspect-opinion pair detected from the input image [127].…”

Section: Pre-training Objectivesmentioning

confidence: 99%

“…• Multimodal Sentiment Prediction (MSP) enhance the pre-trained models by capturing the subjective information from visionlanguage inputs [127].…”

Section: Pre-training Objectivesmentioning

confidence: 99%

See 2 more Smart Citations

Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey

Wang¹,

Chen²,

Qian³

et al. 2023

Preprint

View full text Add to dashboard Cite

With the urgent demand for generalized deep models, many pre-trained big models are proposed, such as BERT, ViT, GPT, etc. Inspired by the success of these models in single domains (like computer vision and natural language processing), the multi-modal pre-trained big models have also drawn more and more attention in recent years. In this work, we give a comprehensive survey of these models and hope this paper could provide new insights and helps fresh researchers to track the most cuttingedge works. Specifically, we firstly introduce the background of multi-modal pre-training by reviewing the conventional deep learning, pre-training works in natural language process, computer vision, and speech. Then, we introduce the task definition, key challenges, and advantages of multi-modal pretraining models (MM-PTMs), and discuss the MM-PTMs with a focus on data, objectives, network architectures, and knowledge enhanced pre-training. After that, we introduce the downstream tasks used for the validation of large-scale MM-PTMs, including generative, classification, and regression tasks. We also give visualization and analysis of the model parameters and results on representative downstream tasks. Finally, we point out possible research directions for this topic that may benefit future works. In addition, we maintain a continuously updated paper list for large-scale pre-trained multi-modal big models: https://github.com/wangxiao5791509/MultiModal BigModels Survey.

show abstract

Section: Pre-training Objectivesmentioning

confidence: 99%

“…• Visual Aspect-Opinion Generation (AOG) targets at generating the aspect-opinion pair detected from the input image [127].…”

Section: Pre-training Objectivesmentioning

confidence: 99%