CubeSats provide a low-cost, convenient, and effective way of acquiring remote sensing data, and have great potential for remote sensing object detection. Although deep learning-based models have achieved excellent performance in object detection, they suffer from the problem of numerous parameters, making them difficult to deploy on CubeSats with limited memory and computational power. Existing approaches attempt to prune redundant parameters, but this inevitably causes a degradation in detection accuracy. In this paper, the novel Context-aware Dense Feature Distillation (CDFD) is proposed, guiding a small student network to integrate features extracted from multi-teacher networks to train a lightweight and superior detector for onboard remote sensing object detection. Specifically, a Contextual Feature Generation Module (CFGM) is designed to rebuild the non-local relationships between different pixels and transfer them from teacher to student, thus guiding students to extract rich contextual features to assist in remote sensing object detection. In addition, an Adaptive Dense Multi-teacher Distillation (ADMD) strategy is proposed, which performs adaptive weighted loss fusion of students with multiple well-trained teachers, guiding students to integrate the learning of helpful knowledge from multiple teachers. Extensive experiments were conducted on two large-scale remote sensing object detection datasets with various network structures; the results demonstrate that the trained lightweight network achieves auspicious performance. Our approach also shows good generality for existing state-of-the-art remote sensing object detectors. Furthermore, by experimenting on large general object datasets, we demonstrate that our approach is equally practical for general object detection distillation.
In the space station, visual gesture recognition is an important component for gesture based human-robot interaction to control the robot, which assists astronauts to complete simple, repetitive and cooperative work. However, public gesture datasets are captured in the daily environment and existing approaches trained on these datasets achieve imprecise performance for astronaut gesture recognition. In this paper, we introduce a new astronaut gesture dataset (DSSL-Astronaut gesture dataset) and a novel hierarchical attention single-shot detector network (HA-SSD) for astronaut gesture recognition. Specifically, this dataset consists of the real and augmented images. The real images captured in the simulated space station are closed to the real images in the space station environment. Meanwhile, we utilize the Mask-RCNN model to segment the foreground image including an astronaut from real data and capture the background images of the simulated space station at different views and illuminations. Then we combine them to synthesize the augmented images. A novel HA-SSD model consists of a lightweight backbone named MobileNet and a hierarchical attention mechanism. The MobileNet is used as the feature extractor, which contains several depth-wise separable convolutions to make a trade-off between latency and accuracy. Meanwhile, the hierarchical channel-wise attention module is adopted to exploit the fine semantic information to enrich the features for improving the performance of gesture recognition. We conduct experiments to demonstrate that our dataset is suitable for the space station and our approach is able to effectively localize and recognize the gestures with strong generality.
Visual surface inspection is a challenging task owing to the highly diverse appearance of target surfaces and defective regions. Previous attempts heavily rely on vast quantities of training examples with manual annotation. However, in some practical cases, it is difficult to obtain a large number of samples for inspection. To combat it, we propose a hierarchical texture-perceiving generative adversarial network (HTP-GAN) that is learned from the one-shot normal image in an unsupervised scheme. Specifically, the HTP-GAN contains a pyramid of convolutional GANs that can capture the global structure and fine-grained representation of an image simultaneously. This innovation helps distinguishing defective surface regions from normal ones. In addition, in the discriminator, a texture-perceiving module is devised to capture the spatially invariant representation of normal image via directional convolutions, making it more sensitive to defective areas. Experiments on a variety of datasets consistently demonstrate the effectiveness of our method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.