Most state-of-the-art deep learning-based methods for extraction of building footprints are aimed at designing proper convolutional neural network (CNN) architectures or loss functions able to effectively predict building masks from remote sensing (RS) images. To properly train such CNN models, large-scale and pixel-level building annotations are required. One common approach to obtain scalable benchmark datasets for segmentation of buildings is to register RS images with auxiliary geospatial information data, such as those available from OpenStreetMaps (OSM). However, due to land-cover changes, urban construction, and delayed geospatial information updating, some building annotations may be missing in the corresponding ground-truth building mask layers. This will likely introduce confusion in the training of CNN models for discriminating between background and building pixels. To solve this important issue, we first formulate the problem as a long-tailed classification one. Then, we introduce a new joint loss function based on three terms: 1) logit adjusted cross entropy (LACE) loss, aimed at discriminating This work was in part supported by the Ministry of Science, Innovation and Universities of Spain (RTI2018-098651-B-C54), the Valencian Government of Spain (GV/2020/167), FEDER-Junta de Extremadura (Ref. GR18060) and the European Union under the H2020 EOXPOSURE project (No. 734541).