“…Some representative CNN model architectures include AlexNet (Krizhevsky et al, 2012), ZFNet (Zeiler and Fergus, 2014), VGGNet (Simonyan and Zisserman, 2015), GoogLeNet , Inception series (Ioffe and Szegedy, 2015;Szegedy et al, 2017;Szegedy et al, 2016), ResNet , DenseNet (Huang et al, 2017) and SENet (Hu et al, 2018). Also, some researches have been widely explored to further improve the performance of deep learning based methods for object detection, such as feature enhancement (Cai et al, 2016;Cheng et al, 2019;Cheng et al, 2016b;Kong et al, 2016;Liu et al, 2017b), hard negative mining (Lin et al, 2017c;, contextual information fusion (Bell et al, 2016;Gidaris and Komodakis, 2015;Zhu et al, 2015b), modeling object deformations (Mordan et al, 2018;Ouyang et al, 2017;Xu et al, 2017), and so on.…”