“…• Top-down works, first build a large dataset with an important number of object-classes, mainly objects that can be recognised from remote sensing images, e.g., vehicles or soccer stadiums. Then, the studies analyse these images using a deep learning classification or detection models [6], [7], [10], [12], [19], [20], [29], [33], [28]. • Bottom-up works focus on solving a specific problem that involves one or few object classes, e.g., airports [3], [4], [21], [32], [35], trees [2], [13], [15], [27], clouds [17] and whales [16].…”