“…Pascal VOC 2012 contains 11, 540 images that are split into 5, 717 images for training and 5, 823 images for validation [22]. There are 20 classes of object labels (with their index in parentheses) -aeroplane (1), bicycle (2), bird (3), boat (4), bottle (5), bus (6), car (7), cat (8), chair (9), cow (10), dining table (11), dog (12), horse (13), motorbike (14), person (15), potted plant (16), sheep (17), sofa (18), train (19), and tv monitor (20). We use a ResNet-50 model [25] that was pre-trained on ImageNet [16].…”