<abstract><p>Unmanned Aerial Vehicles have proven to be helpful in domains like defence and agriculture and will play a vital role in implementing smart cities in the upcoming years. Object detection is an essential feature in any such application. This work addresses the challenges of object detection in aerial images like improving the accuracy of small and dense object detection, handling the class-imbalance problem, and using contextual information to boost the performance. We have used a density map-based approach on the drone dataset VisDrone-2019 accompanied with increased receptive field architecture such that it can detect small objects properly. Further, to address the class imbalance problem, we have picked out the images with classes occurring fewer times and augmented them back into the dataset with rotations. Subsequently, we have used RetinaNet with adjusted anchor parameters instead of other conventional detectors to detect aerial imagery objects accurately and efficiently. The performance of the proposed three step pipeline of implementing object detection in aerial images is a significant improvement over the existing methods. Future work may include improvement in the computations of the proposed method, and minimising the effect of perspective distortions and occlusions.</p></abstract>
Multi-label classification task is concerned with classifying an image into one or more classes(categories) based on the content of the image itself. Multi-label classification is different from binary or multi-class classification wherein the aim of the classifier built is to classify the image into a single class from a set number of classes. Existing methods utilize feature extraction techniques such as colour histograms, SIFT which are limited by their representational ability. We propose to overcome this problem by leveraging the rich features that can be extracted from CNN that have been trained on million images. The features are then fed into an Artificial Neural Net, which is trained on the image features and multi-label tags. By utilising transfer learning, we harness the feature representational ability combined with reduced training time. We benchmark the model with dataset obtained from Flickr (FLICKR-25K). The evaluation metrics utilised here include mAP, Training accuracy and TrainingLoss.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.