“…Two typical strategies in image processing are widely adopted, e.g., object detection (Ren et al, 2015) and semantic segmentation (Long et al, 2015). For the former, the rectangular regions of the windows could be detected using a typical detector (Ren et al, 2015) and used for façade reconstruction (Hu et al, 2020;Hensel et al, 2019;Kong and Fan, 2020); however, there is still ample space to improve the localization accuracy, e.g., the intersection of union (IoU). For the latter, pixel-wise segmentation results can be learned end-to-end (Mathias et al, 2016;Gadde et al, 2018); fusing with point clouds (Gadde et al, 2018) and multi-view voting (Ma et al, 2020) can also be adopted to improve the segmentation results.…”