2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021
DOI: 10.1109/iccv48922.2021.00141
|View full text |Cite
|
Sign up to set email alerts
|

Who’s Waldo? Linking People Across Text and Images

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(12 citation statements)
references
References 44 publications
0
3
0
Order By: Relevance
“…We automatically curate data from WC to construct FTT (Figure 2) as follows: (1) The “People by name” category on WC contains 407K distinct people identities. We query each identity's hierarchy of people‐centric subcategories (similar to [CKA*21]) and organize retrieved images by identity. (2) We use a Faster R‐CNN model [RHGS15; JL17; RR17] trained on the WIDER Face dataset [YLLT16] as a face detector.…”
Section: The Faces Through Time Datasetmentioning
confidence: 99%
“…We automatically curate data from WC to construct FTT (Figure 2) as follows: (1) The “People by name” category on WC contains 407K distinct people identities. We query each identity's hierarchy of people‐centric subcategories (similar to [CKA*21]) and organize retrieved images by identity. (2) We use a Faster R‐CNN model [RHGS15; JL17; RR17] trained on the WIDER Face dataset [YLLT16] as a face detector.…”
Section: The Faces Through Time Datasetmentioning
confidence: 99%
“…Person-Centric Vision-Language Task The person-centric vision-language task (Zellers et al, 2019;Dong et al, 2022;Cui et al, 2021;You et al, 2022), is mainly based on grounding references to a person; therefore, person-centric visual grounding ability is a crucial component. VCR (Zellers et al, 2019) is a task that answers commonsensical questions about the people depicted in an image.…”
Section: Related Workmentioning
confidence: 99%
“…The person-centric visual grounding task (Cui et al, 2021) aims to predict a mentioned person, given an image and a contextual textual description. The person-centric commonsense grounding task (You et al, 2022), which extends a person-centric visual grounding task to a commonsense domain, is designed to identify the person mentioned in the commonsense description in the image.…”
Section: Related Workmentioning
confidence: 99%
“…MCR has recently gained increasing attention, with several notable studies (Ramanathan et al, 2014;Huang et al, 2018;Cui et al, 2021;Parcalabescu et al, 2021;Guo et al, 2022;Goel et al, 2022;Hong et al, 2023). However, many of them focus on images with simple short sentences, such as 'A woman is driving a motorcycle.…”
Section: Introductionmentioning
confidence: 99%
“…Is she wearing a helmet?' Parcalabescu et al, 2021), or are limited to identifying movie characters or people (Ramanathan et al, 2014;Cui et al, 2021). More recently, Goel et al (2022) introduced a challenging and unconstrained MCR problem (see Figure 1) including a dataset, Coreferenced Image Narratives (CIN), with both people and objects as referents with long textual descriptions (narrations).…”
Section: Introductionmentioning
confidence: 99%