2022
DOI: 10.1007/978-3-031-20497-5_28
|View full text |Cite
|
Sign up to set email alerts
|

Illumination-Guided Transformer-Based Network for Multispectral Pedestrian Detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6

Relationship

2
4

Authors

Journals

citations
Cited by 8 publications
(2 citation statements)
references
References 24 publications
0
2
0
Order By: Relevance
“…We assessed our innovative approach on three publicly available datasets and the results show that our method surpassed state-of-theart methods by a wide margin in terms of mean Average Precision. Owing to its application portability, in our future work, we will look at applying our model to different close tasks such as image segmentation [4], event detection [49], object detection [50], [51], pedestrian detection [52], [53], pedestrian attribute recognition [54], person search [45], [55], 3D model retrieval [56], [57], zero-shot learning [58], and magnetic resonance imaging [59] though there should be some adjustment on DCMSTRD and MSLD modules according to adjust to the requirements of each specific task.…”
Section: Discussionmentioning
confidence: 99%
“…We assessed our innovative approach on three publicly available datasets and the results show that our method surpassed state-of-theart methods by a wide margin in terms of mean Average Precision. Owing to its application portability, in our future work, we will look at applying our model to different close tasks such as image segmentation [4], event detection [49], object detection [50], [51], pedestrian detection [52], [53], pedestrian attribute recognition [54], person search [45], [55], 3D model retrieval [56], [57], zero-shot learning [58], and magnetic resonance imaging [59] though there should be some adjustment on DCMSTRD and MSLD modules according to adjust to the requirements of each specific task.…”
Section: Discussionmentioning
confidence: 99%
“…To begin with, for feature representation of both 2D images and 3D models, a better backbone is always encouraged, which draws our attention to the trendy vision transformers (ViT) recently. It has proved to be a success in many relative computer vision and natural language processing (NLP) such as video event detection [16], pedestrian detection [17], person search [18,19], and text classification [20]. ViT takes the image patch or word embedding as a sequence of tokens, and applies the self-attention mechanism to capture the internal relationships thus obtaining strong feature representation connected with downstream tasks.…”
Section: Introductionmentioning
confidence: 99%