Infant drowning has occurred frequently in swimming pools recent years, which motivates the research on automatic real-time detection of the accident. Unlike youths or adults, swimming infants are small in terms of size and motion range, and unable to send out distress signals in emergencies, which exerts negative effects on the detection of drowning. Aiming at this problem, a new step is initialized towards detecting infant drowning automatically and efficiently based on video surveillance. Diverse live-scene videos of infant swimming and drowning are collected from a variety of natatoriums and labeled as datasets.A part of the datasets is downscaled or enlarged to enhance generalization ability of the model. On this basis, advantages of Faster R-CNN and a series of YOLOv5 models are specifically explored to enable fast and accurate detection of infant drowning in real-world. Supervised learning experiments are carried out, model test results show that mean Average Precision (mAP) of either Faster R-CNN or YOLOv5s of the series of YOLOv5 can be over 89%; the former can process merely 6 frames of videos per second with the precision of only 62.04%, while the latter can reach an average speed of 75 frames/s with the precision of about 86.6%. The YOLOv5s eventually stands out as an optimal model for detecting infant drowning in view of comprehensive performance, which is of great application value to reduce the accidents in swimming pools.