Gated bidirectional feature pyramid network for accurate one-shot detection

Woo, Sanghyun; Hwang, Soonmin; Jang, Hodeok; Kweon, In So

doi:10.1007/s00138-019-01017-9

Cited by 23 publications

(18 citation statements)

References 54 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In order to verify the effectiveness of the cascaded convolutional neural network model and training method designed in this paper, using the same training and test data, five target detection methods based on convolutional neural networks were compared: SSD300 [31], YoLoV2 [32], FRCNN [33], RetinaNet [34] and MTCNN algorithm [35]. SSD300 uses VGG16 [36] as the backbone network, and YoLov2, FRCNN and RetinaNet use ResNet50 as the backbone network.…”

Section: Figure 8ap Results For Different Image Typesmentioning

confidence: 99%

Value of Virtual Reality Technology in Image Inspection and 3D Geometric Modeling

2020

IEEE Access

View full text Add to dashboard Cite

Aiming at the poor expressive ability of image statistical information during the reconstruction process of traditional 3D image reconstruction method based on virtual reality technology, resulting in low accuracy of 3D image after reconstruction, a new image detection and 3D image reconstruction based on virtual reality technology are studied method. This paper first proposed a new twolevel cascade convolutional neural network structure. The first level of the network predicts target positioning based on the image-level labels of the training image, generates a bounding box of the target in the original image, and generates a cropped image. The cropped image is input to the second-level network. The cropped image may contain areas where the target is stuck in the original image. Level 2 networks only use the adhesion area as training data.Secondly, the visualization software development platform and virtual reality 3D image processing software are selected as the platform for 3D image reconstruction. After the original image is imported into the computer through data input and file analysis steps, the original image is detected. The detected image is in the virtual in the real software, the bounding box method is first used to construct the three-dimensional data field of image reconstruction, and the three-dimensional direct volume of the image is drawn according to the three-dimensional data field of image reconstruction. Preferably, the three-dimensional image reconstruction output formula is obtained through the threedimensional image direct volume to realize the three-dimensional image reconstruction based on the virtual reality technology. The simulation results show that the method proposed in this paper can effectively detect images. The average traversal coverage of 3D image reconstruction is up to 0.979, and the reconstruction accuracy is higher than 0.97. INDEX TERMS image classification; distributed network representation learning; deep learning; neighbor reconstruction.

show abstract

Section: Figure 8ap Results For Different Image Typesmentioning

confidence: 99%

Value of Virtual Reality Technology in Image Inspection and 3D Geometric Modeling

2020

IEEE Access

View full text Add to dashboard Cite

show abstract

“…In [10], Woo et al proposed a gated bidirectional feature pyramid network to tackle this issue by using a gating module on the SSD frame. The gate module is not easy to be trained.…”

Section: Introductionmentioning

confidence: 99%

“…It is bidirectional and can fuse both deep and shallow features towards more effective and robust object detection. Due to the "residual" nature, similar to ResNet [5], it can be easily trained and integrated into different backbones (even deeper or lighter) than other bi-directional methods [7], [10]. Besides this structure, a new BiFusion module is proposed to let the "residual" features form a compact representation that brings more accurate localization information into each prediction layer so that not only the results on small-sized object detection but also large/medium-sized ones are improved.…”

Section: Introductionmentioning

confidence: 99%

Parallel Residual Bi-Fusion Feature Pyramid Network for Accurate Single-Shot Object Detection

Chen

Chang

Hsieh

et al. 2021

IEEE Trans. on Image Process.

View full text Add to dashboard Cite

State-of-the-art (SoTA) models have improved the accuracy of object detection with a large margin via a FP (feature pyramid). FP is a top-down aggregation to collect semantically strong features to improve scale invariance in both two-stage and one-stage detectors. However, this topdown pathway cannot preserve accurate object positions due to the shift-effect of pooling. Thus, the advantage of FP to improve detection accuracy will disappear when more layers are used. The original FP lacks a bottom-up pathway to offset the lost information from lower-layer feature maps. It performs well in large-sized object detection but poor in small-sized object detection. A new structure "residual feature pyramid" is proposed in this paper. It is bidirectional to fuse both deep and shallow features towards more effective and robust detection for both smallsized and large-sized objects. Due to the "residual" nature, it can be easily trained and integrated to different backbones (even deeper or lighter) than other bidirectional methods. One important property of this residual FP is: accuracy improvement is still found even if more layers are adopted. Extensive experiments on VOC and MS COCO datasets showed the proposed method achieved the SoTA results for highly-accurate and efficient object detection..

show abstract

“…Both these approaches rely on assimilating information via their pixel-connectivity to improve feature representations. For scale relations, many efforts have been made on fusing features across scales to alleviate the discrepancy of feature maps from different levels of bottom-up hierarchy and feature scale-space, including top-down information flow [15, 40, 54], an extra bottom-up information path [31,43,68], multiple hourglass structures [46,81], concatenating features from different layers [4,20,38,59] or different tasks [52], gradual multi-stage local information fusions [58,75], pyramid convolutions [67], etc. Even though standard design principles for scale relations are emerging for ConvNet architectures, the problem is far from being solved.…”

Section: Introductionmentioning

confidence: 99%

HR-RCNN: Hierarchical Relational Reasoning for Object Detection

Chen

Shrivastava

2021

Preprint

View full text Add to dashboard Cite

Incorporating relational reasoning in neural networks for object recognition remains an open problem. Although many attempts have been made for relational reasoning, they generally only consider a single type of relationship. For example, pixel relations through self-attention (e.g., non-local networks), scale relations through feature fusion (e.g., feature pyramid networks), or object relations through graph convolutions (e.g., reasoning-RCNN). Little attention has been given to more generalized frameworks that can reason across these relationships. In this paper, we propose a hierarchical relational reasoning framework (HR-RCNN) for object detection, which utilizes a novel graph attention module (GAM). This GAM is a concise module that enables reasoning across heterogeneous nodes by operating on the graph's edges directly. Leveraging heterogeneous relationships, our HR-RCNN shows great improvement on COCO dataset, for both object detection and instance segmentation.

show abstract

Gated bidirectional feature pyramid network for accurate one-shot detection

Cited by 23 publications

References 54 publications

Value of Virtual Reality Technology in Image Inspection and 3D Geometric Modeling

Value of Virtual Reality Technology in Image Inspection and 3D Geometric Modeling

Parallel Residual Bi-Fusion Feature Pyramid Network for Accurate Single-Shot Object Detection

HR-RCNN: Hierarchical Relational Reasoning for Object Detection

Contact Info

Product

Resources

About