Proceedings of the 29th ACM International Conference on Multimedia 2021
DOI: 10.1145/3474085.3475677
|View full text |Cite
|
Sign up to set email alerts
|

Exploring Logical Reasoning for Referring Expression Comprehension

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 33 publications
0
1
0
Order By: Relevance
“…BBA (Li, Bu, and Cai 2021) proposes a multi-step bidirectional potential referred pairs to align different granularity level by pyramid visual and textual features. LGREC (Cheng et al 2021) extends a logical matching module based on CM-Att-Erase (Liu et al 2019), which performs logical matching over them with explicit logical sentences. CMRE (Yang, Li, and Yu 2021) proposes a crossmodal relation extractor to generate a semantic graph guided by sentences and images.…”
Section: Related Work Visual Groundingmentioning
confidence: 99%
“…BBA (Li, Bu, and Cai 2021) proposes a multi-step bidirectional potential referred pairs to align different granularity level by pyramid visual and textual features. LGREC (Cheng et al 2021) extends a logical matching module based on CM-Att-Erase (Liu et al 2019), which performs logical matching over them with explicit logical sentences. CMRE (Yang, Li, and Yu 2021) proposes a crossmodal relation extractor to generate a semantic graph guided by sentences and images.…”
Section: Related Work Visual Groundingmentioning
confidence: 99%
“…The Panoptic Narrative Grounding (PNG) task is rapidly gaining prominence as a critical area of research in the multimodal domain [11,36,37,52,58,59]. This task aims to generate a pixel-level mask for each noun present in a given long sentence, providing a more fine-grained understanding compared to other cross-modal tasks, such as image captioning [6,35,42,51,62], visual question answering [23,47,57,73], and referring expression comprehension/segmentation [5,19,[28][29][30]33]. This level of detail sets it apart and opens up a wide range of potential applications, including fine-grained image editing [22,54] and fine-grained image-text retrieval [17,45].…”
Section: Introductionmentioning
confidence: 99%