Deep-learning object detection methods that are designed for computer vision applications tend to under-perform when applied to remote sensing data. This is because, contrary to computer vision, in remote sensing training data are harder to collect and targets can be very small, occupying only a few pixels in the entire image, and exhibit arbitrary perspective transformations. Detection performance can improve by fusing data from multiple remote sensing modalities, including RGB, IR, hyper-spectral, multi-spectral, synthetic aperture radar, and LiDAR, to name a few. In this work, we propose YOLOrs: a new convolutional neural network, specifically designed for realtime object detection in multimodal remote sensing imagery. YOLOrs can detect objects at multiple scales, with smaller receptive fields to account for small targets, as well as predict target orientations. In addition, YOLOrs introduces a novel midlevel fusion architecture that renders it applicable to multimodal aerial imagery. Our experimental studies compare YOLOrs with contemporary alternatives and corroborate its merits.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.