2022
DOI: 10.1016/j.eswa.2022.118285
|View full text |Cite
|
Sign up to set email alerts
|

Reasonable object detection guided by knowledge of global context and category relationship

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 5 publications
0
1
0
Order By: Relevance
“…First, we notice that using the spatial encoder to implicitly extract intra-frame contexts yields a very small benefit. Since the global scene context is useful for vision-related tasks (Wang et al, 2019;Zhang et al, 2021;Ji et al, 2022), we extend the spatial encoder to explicitly generate a global feature vector for each frame. Inspired by Vision Transformer (ViT) (Dosovitskiy et al, 2021), we prepend a learnable class token to the spatial encoder input, which captures the global relationship among all human-object pairs at a particular moment.…”
Section: Introductionmentioning
confidence: 99%
“…First, we notice that using the spatial encoder to implicitly extract intra-frame contexts yields a very small benefit. Since the global scene context is useful for vision-related tasks (Wang et al, 2019;Zhang et al, 2021;Ji et al, 2022), we extend the spatial encoder to explicitly generate a global feature vector for each frame. Inspired by Vision Transformer (ViT) (Dosovitskiy et al, 2021), we prepend a learnable class token to the spatial encoder input, which captures the global relationship among all human-object pairs at a particular moment.…”
Section: Introductionmentioning
confidence: 99%