2021
DOI: 10.48550/arxiv.2109.13027
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Experience feedback using Representation Learning for Few-Shot Object Detection on Aerial Images

Abstract: This paper proposes a few-shot method based on Faster R-CNN and representation learning for object detection in aerial images. The two classification branches of Faster R-CNN are replaced by prototypical networks for online adaptation to new classes. These networks produce embeddings vectors for each generated box, which are then compared with class prototypes. The distance between an embedding and a prototype determines the corresponding classification score. The resulting networks are trained in an episodic … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(13 citation statements)
references
References 21 publications
0
13
0
Order By: Relevance
“…These vectors are used in the classification head of the detector as class prototypes. The difference is that the vectors are learned through training and not simply computed from examples like in Prototypical Faster R-CNN [10]. The authors of this method proposed to embed a vanilla prototypical network into Faster R-CNN not only in the detection head but in the RPN as well.…”
Section: Metric Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…These vectors are used in the classification head of the detector as class prototypes. The difference is that the vectors are learned through training and not simply computed from examples like in Prototypical Faster R-CNN [10]. The authors of this method proposed to embed a vanilla prototypical network into Faster R-CNN not only in the detection head but in the RPN as well.…”
Section: Metric Learningmentioning
confidence: 99%
“…Indeed, there exist some papers that use metric learning to solve FSOD (see e.g. [9][10][11]). Instead of combining query and support features, a generic embedding function is learned.…”
Section: Introductionmentioning
confidence: 99%
“…Negative proposals are generated when regressing bounding boxes, which are incorrect proposals including partial foreground objects or total background. The principle of classification in detection pipeline is to achieve separability between embedding vectors of samples from different categories in a high-dimensional embedding space, where the embedding vectors are usually generated by positive proposals [77][78][79]. In addition, NP-Repmet [76] figures out that the introduction of negative proposals could enhance the separability of all classes in the embedding space.…”
Section: Sample: Feature Enhancement and Multimodal Fusionmentioning
confidence: 99%
“…To our best knowledge, instead of designing a distinctly new detection framework from scratch, FSOD appends the process of meta knowledge extraction and sharing on classic deep learning object detection baselines, such as two-stage Faster R-CNN [11,16,17,72,73,78,[83][84][85][86][87][88][89][90][91][92][93][94][95][96][97][98], one-stage YOLO [10,74,99], CenterNet [100,101], and Vision Transformer [102]. The numerous published reports indicate that the two-stage network is more favored, and the two-stage network has more advantages because of its higher detection accuracy, more interpretive, and extensible network structure.…”
Section: Model: Semantics Extraction and Cross-domain Mappingmentioning
confidence: 99%
See 1 more Smart Citation