In this paper, we present a Computer Vision (CV) based tracking and fusion algorithm, dedicated to a 3D printed gimbal system on drones flying in nature. The whole gimbal system can stabilize the camera orientation robustly in challenging environments by using skyline and ground plane as references. Our main contributions are the following: a) a light-weight Resnet-18 backbone network model was trained from scratch, and deployed onto the Jetson Nano platform to segment the image specifically into binary parts (ground and sky); b) our geometry assumption from the skyline and ground cues delivers the potential for robust visual tracking in the wild by using the skyline and ground plane as references; c) a manifold surface-based adaptive particle sampling can fuse orientation from multiple sensor sources flexibly. The whole algorithm pipeline is tested on our 3Dprinted gimbal module with Jetson Nano. The experiments were performed on top of a building in a real landscape. The public code link: https://github.com/alexandor91/gimbalfusion.git.• A lightweight binary segmentation model is trained to label the ground and sky pixels specifically, aiming for real-time inference on the embedded device.