“…Traditional object detection algorithms mainly include the deformable parts model (DPM) ( Dollár et al., 2009 ), selective search (SS) ( Uijlings et al., 2013 ), Oxford-MKL ( Vedaldi et al., 2009 ), and NLPR-HOGLBP ( Yu et al., 2010 ), etc. Traditional object detection algorithm basic structure mainly includes the following three-part: 1) region selector, first, a sliding window of different sizes and proportions is set for a given image, and the entire image is traversed from left to right and top to bottom to frame a specific part of the image to be detected as a candidate region; 2) feature extraction, extract visual features of candidate regions, such as scale-invariant feature transform (SIFT) ( Bingtao et al., 2015 ), Haar ( Lienhart and Maydt, 2002 ), histogram of oriented gradient (HOG) ( Shu et al., 2021 ) commonly used in face and standard object detection, and other features to extract features for each region; 3) classifier classification, use the trained classifier to identify the target category of the feature, such as the commonly used deformable part model (DPM), adaboot ( Viola and Jones, 2001 ), support vector machines (SVM) ( Ashritha et al., 2021 ) and other classifiers. However, these three parts achieved certain results while exposing their inherent flaws, such as using a sliding window for region selection will result in high time complexity and window redundancy, the uncertainty of illumination change and the diversity of background will result in poor robustness of the guide design feature technique ( Cao et al., 2020a ), poor generalization, and complex algorithm stages will result in slow detection efficiency and low accuracy ( Wu et al., 2021 ).…”