Firefighters need to gain information from both inside and outside of buildings in first response emergency scenarios. For this purpose, drones are beneficial. This paper presents an elicitation study that showed the firefighters' desire to collaborate with autonomous drones. We developed a Human-Drone Interaction (HDI) method for indicating a target to a drone using 3D pointing gestures estimated solely from a monocular camera. The participant first points to a window without using any wearable or body-attached device. Through the drone's front-facing camera, the drone detects the gesture and computes the target window. This work includes a description of the process for choosing the gesture, detecting and localizing objects, and carrying out the transformations between coordinate systems. Our proposed 3D pointing gesture improves a 2D pointing gesture interface by integrating depth information with SLAM, solving multiple objects aligned on the same plane ambiguity, in a large-scale outdoor environment. Experimental results showed that our 3D pointing gesture interface obtained a 0.85 and 0.73 F1-Score on average in simulation and real-world experiments and 0.58 F1-Score at the maximum distance of 25 meters between drone and building.