Sound source localization from a flying drone is a challenging task due to the strong ego-noise from rotating motors and propellers as well as the movement of the drone and the sound sources. To address this challenge, we propose a deep learning-based framework that integrates single-channel noise reduction and multi-channel source localization. In this framework we suppress the ego-noise and estimate a time-frequency soft ratio mask with a single-channel deep neural network (DNN). Then we design two downstream multi-channel source localization algorithms, based on Steered Response Power (SRP-DNN) and Time-Frequency Spatial filtering (TFS-DNN). The main novelty lies in the proposed TFS-DNN approach, which estimates the presence probability of the target sound at individual time-frequency bins by combining the DNN-inferred soft ratio mask and the instantaneous direction of arrival of the sound received by the microphone array. The time-frequency presence probability of the target sound is then used to design a set of spatial filters to construct a spatial likelihood map for source localization. By jointly exploiting spectral and spatial information, TFS-DNN robustly processes signals in short segments (e.g. 0.5 seconds) in dynamic and low signal-noise-ratio scenarios (e.g. SNR -20 dB). Results on real and simulated data in a variety of scenarios (static sources, moving sources and moving drones) indicate the advantage of TFS-DNN over competing methods, including SRP-DNN and the state-of-the-art time-frequency spatial filtering.