2023
DOI: 10.1609/aaai.v37i1.25208
|View full text |Cite
|
Sign up to set email alerts
|

Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task

Abstract: VQA is an ambitious task aiming to answer any image-related question. However, in reality, it is hard to build such a system once for all since the needs of users are continuously updated, and the system has to implement new functions. Thus, Continual Learning (CL) ability is a must in developing advanced VQA systems. Recently, a pioneer work split a VQA dataset into disjoint answer sets to study this topic. However, CL on VQA involves not only the expansion of label sets (new Answer sets). It is crucial to st… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 13 publications
(1 citation statement)
references
References 30 publications
0
1
0
Order By: Relevance
“…Cross-modal matching aims to align different modalities (e.g., text and image) within a common space and pair them based on similarity score. With the explosion of multimedia data, cross-modal matching has gained traction in both industry and academia, e.g., text-to-image generation (Zhou et al 2022;Ding et al 2021), image captioning (Li et al 2019b;Stefanini et al 2022;Wang et al 2023), and visual question answering (Lin et al 2022;Lei et al 2023). These works have achieved promising performance by training on large-scale datasets.…”
Section: Introductionmentioning
confidence: 99%
“…Cross-modal matching aims to align different modalities (e.g., text and image) within a common space and pair them based on similarity score. With the explosion of multimedia data, cross-modal matching has gained traction in both industry and academia, e.g., text-to-image generation (Zhou et al 2022;Ding et al 2021), image captioning (Li et al 2019b;Stefanini et al 2022;Wang et al 2023), and visual question answering (Lin et al 2022;Lei et al 2023). These works have achieved promising performance by training on large-scale datasets.…”
Section: Introductionmentioning
confidence: 99%