Remote sensing images generally have characteristics such as changes in target direction, multi-scale target size, and dense target distribution, which make target detection in remote sensing images a challenging task. In addition, owing to the limited computing resources of detection platforms, such as drones, it is difficult to deploy detectors with large parameter quantities. This study proposes a lightweight object detection algorithm for remote sensing images called YOLO-EMS, which can ensure real-time detection while improving the detection performance of the model. First, Ghost Convolution (GhostConv) is applied to replace the traditional convolution in backbone networks. In addition, Efficient Multi-Scale Convolution modules (EMSConv) and Extended-EMSC (E-EMSConv) were proposed, which were combined with C2f modules to form EMSC-C2f, and E-EMSC-C2f was used to reduce the model size. Finally, we propose a novel bounding box regression loss function Normalized Corner Distance IoU (NCDIoU), which improves the accuracy of object detection. We compared and tested our proposed convolution module with other mainstream modules and attention mechanisms on the remote sensing image dataset RSOD, and found that our mAP50 increased by a maximum of 11.4\%. In addition, we also conducted ablation experiments on the DIOR and PASCAL VOC datasets, and our algorithm improved by 0.8\% and 0.5\% compared to YOLOv8n on mAP50. It reduces the number of parameters and FLOPs by 10.8\% and 7.4\%, respectively. Finally, we compare the proposed algorithm with other lightweight networks. Our model reduces FLOPs by up to 60.3\%, parameters by 77.9\%, and improves mAP by up to 9.5\% and 13.0\% in the DIOR and PASCAL VOC datasets, respectively.