“…Training images are treated as labeled bags, which consist of multiple candidate bounding boxes. The learning procedure alternates between selecting the most confident proposals and using them to train a detector [5,24,16]. Recently, many works combine convolutional neural networks (CNN) with MIL and get promising results [2,26,25,27,33,30,31,9,29,32,14,12].…”