The target detection of smoke through remote sensing images obtained by means of unmanned aerial vehicles (UAVs) can be effective for monitoring early forest fires. However, smoke targets in UAV images are often small and difficult to detect accurately. In this paper, we use YOLOX-L as a baseline and propose a forest smoke detection network based on the parallel spatial domain attention mechanism and a small-scale transformer feature pyramid network (PDAM–STPNNet). First, to enhance the proportion of small forest fire smoke targets in the dataset, we use component stitching data enhancement to generate small forest fire smoke target images in a scaled collage. Then, to fully extract the texture features of smoke, we propose a parallel spatial domain attention mechanism (PDAM) to consider the local and global textures of smoke with symmetry. Finally, we propose a small-scale transformer feature pyramid network (STPN), which uses the transformer encoder to replace all CSP_2 blocks in turn on top of YOLOX-L’s FPN, effectively improving the model’s ability to extract small-target smoke. We validated the effectiveness of our model with recourse to a home-made dataset, the Wildfire Observers and Smoke Recognition Homepage, and the Bowfire dataset. The experiments show that our method has a better detection capability than previous methods.