Unmanned aerial vehicles (UAVs) have been used for many applications in recent years, from urban search and rescue, to agricultural surveying, to autonomous underground mine exploration. However, deploying UAVs in tight, indoor spaces, especially close to humans, remains a challenge. One solution, when limited payload is required, is to use micro-UAVs, which pose less risk to humans and typically cost less to replace after a crash. However, micro-UAVs can only carry a limited sensor suite, e.g. a monocular camera instead of a stereo pair or LiDAR, complicating tasks like dense mapping and markerless multi-person 3D human pose estimation, which are needed to operate in tight environments around people. Monocular approaches to such tasks exist, and dense monocular mapping approaches have been successfully deployed for UAV applications. However, despite many recent works on both marker-based and markerless multi-UAV single-person motion capture, markerless single-camera multi-person 3D human pose estimation remains a much earlier-stage technology, and we are not aware of existing attempts to deploy it in an aerial context. In this paper, we present what is thus, to our knowledge, the first system to perform simultaneous mapping and multi-person 3D human pose estimation from a monocular camera mounted on a single UAV. In particular, we show how to loosely couple state-of-the-art monocular depth estimation and monocular 3D human pose estimation approaches to reconstruct a hybrid map of a populated indoor scene in real time. We validate our component-level design choices via extensive experiments on the large-scale ScanNet and GTA-IM datasets. To evaluate our system-level performance, we also construct a new Oxford Hybrid Mapping dataset of populated indoor scenes.
I. INTRODUCTIONRecent years have seen huge improvements in the flight stability and obstacle avoidance capabilities of unmanned aerial vehicles, driven by applications including aerial search and rescue [1], aerial tracking and surveillance [2], drone cinematography [3], robotic agriculture [4], and the exploration of everything from mines [5] to other planets [6]. However, deploying drones in confined indoor spaces close to people remains challenging. This is unfortunate, because numerous applications, from awareness systems for emergency responders to indoor drone cinematography for film-makers, could benefit significantly from such a capability.To operate in such an environment, it is helpful for a drone to be able to both map its geometry and detect/track the people moving within it, ideally in real time. At the same time, however, the physical constraints imposed by the environment encourage the use of a small drone (e.g. ≈10cmAll authors are with the University of Oxford. M. Vankadari, A. Everitt and S. Shin assert joint second authorship.