Proceedings of the 29th ACM International Conference on Multimedia 2021
DOI: 10.1145/3474085.3475310
|View full text |Cite
|
Sign up to set email alerts
|

Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
11
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 34 publications
(15 citation statements)
references
References 39 publications
0
11
0
Order By: Relevance
“…They propose a new barebones model which addresses the poor generalization exhibited due to over-fitting individual scenes and camera configurations. Human detection in overlapping regions has also been investigated by Hou et al [7]. They proposed a new multi-view detector, MVDeTr, where the detector fuses multi-view information by introducing a shadow transformer.…”
Section: Related Workmentioning
confidence: 99%
“…They propose a new barebones model which addresses the poor generalization exhibited due to over-fitting individual scenes and camera configurations. Human detection in overlapping regions has also been investigated by Hou et al [7]. They proposed a new multi-view detector, MVDeTr, where the detector fuses multi-view information by introducing a shadow transformer.…”
Section: Related Workmentioning
confidence: 99%
“…Recent work explicitly addressing multi-object detection using contemporary detection architectures is limited [13]- [16], [19], [20]. Nassar et al [19] apply a convolutional neural network that takes multi-view images and corresponding geolocation information as inputs and uses a joint loss function considering all views, resulting in an increase of the detection mAP by up to 27.8%.…”
Section: A Multi-view Object Detectionmentioning
confidence: 99%
“…Furthermore, Liu et al [18] improve the detection accuracy with the Swin Transformer, adding multi-scale feature maps and reducing the ViT complexity from O(n 2 ) to O(n) by implementing a shifted-window self-attention pattern. A recent work from Hou and Zheng [16] addresses multi-view pedestrian detection by using a DETR architecture with multi-view attention. In order to account for spatial consistency, they use a projective transform to the common ground plane.…”
Section: B Transformers For Object Detectionmentioning
confidence: 99%
See 1 more Smart Citation
“…3D pose can be estimated [10,36] by merging 2D skeleton estimations from multiple 2D camera views, using a 3D regression network or graph matching. Meanwhile, multi-view person detection approaches [19,20,28,34] also utilize camera calibration to merge multiple 2D detections or features to generate more reliable 3D person detection results. The accuracy of these approaches heavily depends on the quality of the 2D person detection or 2D pose estimation.…”
Section: Related Workmentioning
confidence: 99%