“…Specifically, they aim to learn the features by improving loss functions [9,14,22,31,41,42,50,55,63], improving the training techniques [1,4,12,24,32,35,37,54], adding additional network modules [23,23,51,62], using extra semantic annotations [30,46,47,79] or generating more training samples [17,33,72,76,77]. Besides, more recent studies [3,6,8,10,27,28,38,46,48,53,58,61,64,65,67,80] integrate attention mechanisms into deep models to enhance the feature representation. To obtain the holistic features, most of these methods utilize global average pooling (GAP), global max pooling (GMP) or both of them on each channel of the feature map.…”