Bottom-Up Foreground-Aware Feature Fusion for Person Search

Yang, Wenjie; Li, Dangwei; Chen, Xiaotang; Huang, Kaiqi

doi:10.1145/3394171.3413991

Cited by 8 publications

(2 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…After years of development, the Faster R-CNN [ 16 ] based target detection algorithm can now achieve high performance while also slowing down detection. Some algorithms for pedestrian retrieval [ 17 ] with deeply integrated networks [ 18 ] have achieved reasonable performance. For the one-step approach, they designed a multi-tasking framework [ 19 ] based on Faster R-CNN, establishing a regional proposal network (RPN) to generate region proposal [ 20 ] and then input it into subsequent parallel detection and re-ID branches.…”

Section: Introductionmentioning

confidence: 99%

“…The first part does not contain residual blocks, which are mainly used for the calculation of convolution, regularization, activation function, and maximum pooling of the input, and the second, third, fourth, and fifth parts of the structure contain residual blocks that do not change the size of the residual blocks but are only used to change the dimensionality of the residual blocks. ResNet-50 is a residual network with lower complexity, more stable performance [ 18 ], and faster convergence compared to VGG 16, and it is suitable for many projects with more accurate results in image classification, target detection, and natural language processing. First, we replaced the general convolution in ResNet-50 with inception convolution in Seq-Net, dynamically enhancing the receptive field of feature diagrams [ 20 , 21 ] without increasing computation or degrading feature diagram resolution.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Inception Convolution and Feature Fusion for Person Search

Ouyang

Zeng

Leng

2023

Sensors

View full text Add to dashboard Cite

With the rapid advancement of deep learning theory and hardware device computing capacity, computer vision tasks, such as object detection and instance segmentation, have entered a revolutionary phase in recent years. As a result, extremely challenging integrated tasks, such as person search, might develop quickly. The majority of efficient network frameworks, such as Seq-Net, are based on Faster R-CNN. However, because of the parallel structure of Faster R-CNN, the performance of re-ID can be significantly impacted by the single-layer, low resolution, and occasionally overlooked check feature diagrams retrieved during pedestrian detection. To address these issues, this paper proposed a person search methodology based on an inception convolution and feature fusion module (IC-FFM) using Seq-Net (Sequential End-to-end Network) as the benchmark. First, we replaced the general convolution in ResNet-50 with the new inception convolution module (ICM), allowing the convolution operation to effectively and dynamically distribute various channels. Then, to improve the accuracy of information extraction, the feature fusion module (FFM) was created to combine multi-level information using various levels of convolution. Finally, Bounding Box regression was created using convolution and the double-head module (DHM), which considerably enhanced the accuracy of pedestrian retrieval by combining global and fine-grained information. Experiments on CHUK-SYSU and PRW datasets showed that our method has higher accuracy than Seq-Net. In addition, our method is simpler and can be easily integrated into existing two-stage frameworks.

show abstract