Unmanned aerial vehicles (UAVs) with mounted cameras have the advantage of capturing aerial (bird-view) images. The availability of aerial visual data and the recent advances in object detection algorithms led the computer vision community to focus on object detection tasks on aerial images. As a result of this, several aerial datasets have been introduced, including visual data with object annotations. UAVs are used solely as flying-cameras in these datasets, discarding different data types regarding the flight (e.g., time, location, internal sensors). In this work, we propose a multi-purpose aerial dataset (AU-AIR) that has multi-modal sensor data (i.e., visual, time, location, altitude, IMU, velocity) collected in real-world outdoor environments. The AU-AIR dataset includes meta-data for extracted frames (i.e., bounding box annotations for trafficrelated object category) from recorded RGB videos. Moreover, we emphasize the differences between natural and aerial images in the context of object detection task. For this end, we train and test mobile object detectors (including YOLOv3-Tiny and MobileNetv2-SSDLite) on the AU-AIR dataset, which are applicable for real-time object detection using on-board computers with UAVs. Since our dataset has diversity in recorded data types, it contributes to filling the gap between computer vision and robotics. The dataset is available at https://bozcani.github.io/auairdataset.
Fast and robust gate perception is of great importance in autonomous drone racing. We propose a convolutional neural network-based gate detector (GateNet 1 ) that concurrently detects gate's center, distance, and orientation with respect to the drone using only images from a single fish-eye RGB camera. GateNet achieves a high inference rate (up to 60 Hz) on an onboard processor (Jetson TX2). Moreover, GateNet is robust to gate pose changes and background disturbances. The proposed perception pipeline leverages a fish-eye lens with a wide field-of-view and thus can detect multiple gates in close range, allowing a longer planning horizon even in tight environments. For benchmarking, we propose a comprehensive dataset (AU-DR) that focuses on gate perception. Throughout the experiments, GateNet shows its superiority when compared to similar methods while being efficient for onboard computers in autonomous drone racing. The effectiveness of the proposed framework is tested on a fully-autonomous drone that flies on previously-unknown track with tight turns and varying gate positions and orientations in each lap.
Scene modeling is very crucial for robots that need to perceive, reason about and manipulate the objects in their environments. In this paper, we adapt and extend Boltzmann Machines (BMs) for contextualized scene modeling. Although there are many models on the subject, ours is the first to bring together objects, relations, and affordances in a highly-capable generative model. For this end, we introduce a hybrid version of BMs where relations and affordances are incorporated with shared, tri-way connections into the model. Moreover, we introduce a dataset for relation estimation and modeling studies. We evaluate our method in comparison with several baselines on object estimation, out-ofcontext object detection, relation estimation, and affordance estimation tasks.Moreover, to illustrate the generative capability of the model, we show several example scenes that the model is able to generate, and demonstrate the benefits of the model on a humanoid robot. The code and the dataset are publicly made available at: https://github.com/bozcani/COSMO
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.