“…Concurrently with the achievements in few-shot classification [13] and few-shot semantic segmentation [14], [15], few-shot object detection (FSOD) [16]- [18] has emerged as a compelling research area in recent years. In the conventional FSOD framework, the model undergoes a two-stage training process: first, it is trained on a large-scale labeled dataset consisting of base objects, and subsequently, it is fine-tuned on a fine-tuning set with only a few labeled novel object instances.…”
Section: Introduction Object Detection (Od) Is a Critical Task In Com...mentioning
Object detection is an essential and fundamental task in computer vision and satellite image processing. Existing deep learning methods have achieved impressive performance thanks to the availability of large-scale annotated datasets. Yet, in real-world applications the availability of labels is limited. In this context, few-shot object detection (FSOD) has emerged as a promising direction, which aims at enabling the model to detect novel objects with only few of them annotated. However, many existing FSOD algorithms overlook a critical issue: when an input image contains multiple novel objects and only a subset of them are annotated, the unlabeled objects will be considered as background during training. This can cause confusions and severely impact the model's ability to recall novel objects. To address this issue, we propose a self-training-based FSOD (ST-FSOD) approach, which incorporates the self-training mechanism into the few-shot fine-tuning process. ST-FSOD aims to enable the discovery of novel objects that are not annotated, and take them into account during training. On the one hand, we devise a two-branch region proposal networks (RPN) to separate the proposal extraction of base and novel objects, On another hand, we incorporate the student-teacher mechanism into RPN and the region of interest (RoI) head to include those highly confident yet unlabeled targets as pseudo labels. Experimental results demonstrate that our proposed method outperforms the state-ofthe-art in various FSOD settings by a large margin. The codes will be publicly available at https://github.com/zhu-xlab/ST-FSOD.
“…Concurrently with the achievements in few-shot classification [13] and few-shot semantic segmentation [14], [15], few-shot object detection (FSOD) [16]- [18] has emerged as a compelling research area in recent years. In the conventional FSOD framework, the model undergoes a two-stage training process: first, it is trained on a large-scale labeled dataset consisting of base objects, and subsequently, it is fine-tuned on a fine-tuning set with only a few labeled novel object instances.…”
Section: Introduction Object Detection (Od) Is a Critical Task In Com...mentioning
Object detection is an essential and fundamental task in computer vision and satellite image processing. Existing deep learning methods have achieved impressive performance thanks to the availability of large-scale annotated datasets. Yet, in real-world applications the availability of labels is limited. In this context, few-shot object detection (FSOD) has emerged as a promising direction, which aims at enabling the model to detect novel objects with only few of them annotated. However, many existing FSOD algorithms overlook a critical issue: when an input image contains multiple novel objects and only a subset of them are annotated, the unlabeled objects will be considered as background during training. This can cause confusions and severely impact the model's ability to recall novel objects. To address this issue, we propose a self-training-based FSOD (ST-FSOD) approach, which incorporates the self-training mechanism into the few-shot fine-tuning process. ST-FSOD aims to enable the discovery of novel objects that are not annotated, and take them into account during training. On the one hand, we devise a two-branch region proposal networks (RPN) to separate the proposal extraction of base and novel objects, On another hand, we incorporate the student-teacher mechanism into RPN and the region of interest (RoI) head to include those highly confident yet unlabeled targets as pseudo labels. Experimental results demonstrate that our proposed method outperforms the state-ofthe-art in various FSOD settings by a large margin. The codes will be publicly available at https://github.com/zhu-xlab/ST-FSOD.
“…Furthermore, some researchers focused on the connection between query features and support features. To make full use of the information provided by the support set, Wang et al [66] proposed a diversity measurement module, which was used to measure diversity information to obtain more meta-feature knowledge and strengthen the connection between support features and query features. Zhang et al [67] proposed a few-shot remote sensing image detection method of self-adaptive global similarity and two-way foreground stimulator, which improved the spatial similarity and asymmetry problems between support features and query features.…”
The rapid development of Earth observation technology has promoted the continuous accumulation of images in the field of remote sensing. However, a large number of remote sensing images still lack manual annotations of objects, which makes the strongly supervised deep learning object detection method not widely used, as it lacks generalization ability for unseen object categories. Considering the above problems, this study proposes a few-shot remote sensing image object detection method that integrates context dependencies and global features. The method can be used to fine-tune the model with a small number of sample annotations based on the model trained in the base class, as a way to enhance the detection capability of new object classes. The method proposed in this study consists of three main modules, namely, the meta-feature extractor (ME), reweighting module (RM), and feature fusion module (FFM). These three modules are respectively used to enhance the context dependencies of the query set features, improve the global features of the support set that contains annotations, and finally fuse the query set features and support set features. The baseline of the meta-feature extractor of the entire framework is based on the optimized YOLOv5 framework. The reweighting module of the support set feature extraction is based on a simple convolutional neural network (CNN) framework, and the foreground feature enhancement of the support sets was made in the preprocessing stage. This study achieved beneficial results in the two benchmark datasets NWPU VHR-10 and DIOR. Compared with the comparison methods, the proposed method achieved the best performance in the object detection of the base class and the novel class.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.