“…where 𝐿 𝐶𝐸 refers to the cross-entropy loss; 𝐿 𝐷𝑖𝑐𝑒 denotes the dice loss [75]; 𝐿 𝐿𝑜𝑣𝑎𝑠𝑧 corresponds to the Lovasz loss [76]; and 𝐿 𝐵𝐹1 represents the boundary loss [77]. The first three loss functions are all based on intersection over union (IoU) and are designed to address the significant class imbalance between background and building samples in semantic segmentation tasks.…”
Accurately extracting buildings from highresolution remote sensing images is crucial for human productivity and livelihood in urban areas. Due to varying scales and indistinct boundaries of buildings, it is crucial to fully leverage the high-and low-frequency features in building extraction from remote sensing images. However, previous studies have solely relied on either lowor high-frequency features, leading to errors like omissions or internal holes in the detected buildings at various scales. Although some studies have considered the integration between both highand low-frequency features, they overlook the suitability of different network depths for extracting different frequency features. A novel network called Cascaded Inception Conv-Former Network (CICF-Net) is proposed in this study to solve these problems. It leverages the parallel combination of convolutional neural network and Transformer to efficiently extract high-and low-frequency features for building extraction. In the encoder, as the network depth grows, we gradually reduce the contribution of high-frequency branch and enhance the focus on low-frequency branch. Moreover, a cascaded fusion strategy is employed to extract and integrate multi-scale high-and lowfrequency features. Meanwhile, we propose gated convolution UperNet as the decoder, which utilizes recursive gated convolution to facilitate multi-level spatial interactions and better restoration of fine-grained spatial details for building segmentation. The proposed CICF-Net achieves competitive accuracies on three public benchmarks: Massachusetts Building Dataset, WHU Aerial Building Dataset, and Inria Aerial Image Labeling Dataset, with IoU of 75.17%, 91.45%, and 81.28%, respectively. This provides strong evidence of its effectiveness in building extraction, as it can accurately capture spatial details and context of buildings.
“…where 𝐿 𝐶𝐸 refers to the cross-entropy loss; 𝐿 𝐷𝑖𝑐𝑒 denotes the dice loss [75]; 𝐿 𝐿𝑜𝑣𝑎𝑠𝑧 corresponds to the Lovasz loss [76]; and 𝐿 𝐵𝐹1 represents the boundary loss [77]. The first three loss functions are all based on intersection over union (IoU) and are designed to address the significant class imbalance between background and building samples in semantic segmentation tasks.…”
Accurately extracting buildings from highresolution remote sensing images is crucial for human productivity and livelihood in urban areas. Due to varying scales and indistinct boundaries of buildings, it is crucial to fully leverage the high-and low-frequency features in building extraction from remote sensing images. However, previous studies have solely relied on either lowor high-frequency features, leading to errors like omissions or internal holes in the detected buildings at various scales. Although some studies have considered the integration between both highand low-frequency features, they overlook the suitability of different network depths for extracting different frequency features. A novel network called Cascaded Inception Conv-Former Network (CICF-Net) is proposed in this study to solve these problems. It leverages the parallel combination of convolutional neural network and Transformer to efficiently extract high-and low-frequency features for building extraction. In the encoder, as the network depth grows, we gradually reduce the contribution of high-frequency branch and enhance the focus on low-frequency branch. Moreover, a cascaded fusion strategy is employed to extract and integrate multi-scale high-and lowfrequency features. Meanwhile, we propose gated convolution UperNet as the decoder, which utilizes recursive gated convolution to facilitate multi-level spatial interactions and better restoration of fine-grained spatial details for building segmentation. The proposed CICF-Net achieves competitive accuracies on three public benchmarks: Massachusetts Building Dataset, WHU Aerial Building Dataset, and Inria Aerial Image Labeling Dataset, with IoU of 75.17%, 91.45%, and 81.28%, respectively. This provides strong evidence of its effectiveness in building extraction, as it can accurately capture spatial details and context of buildings.
“…The results indicate that the proposed GCC term can solve the non-convergence issue of DC loss and that model performances for different categories are globally enhanced for CE loss and Combo loss (as shown in Table 5). In addition, two boundary-based loss functions -Hausdorff distance (HD) loss 62 and Boundary (BD) loss 63 are used to compare the boundary recognition performance with the proposed method TA B L E 3 Detailed comparisons of the evaluation metrics with different coefficients of GCC components (the three geometrical constraints were separately considered).…”
Section: Ablation Studymentioning
confidence: 99%
“…The choice of loss function is extremely significant for the model optimization of deep learning. Recent advances in medical image segmentation have been summarized, 56,57 and loss functions for semantic segmentation models can be divided into four categories 58 : distribution-based (e.g., cross-entropy (CE) loss 59 ), region-based (e.g., dice coefficient (DC) loss 60 ), compounded (e.g., Combo loss 61 ), and boundary-based (e.g., Hausdorff distance loss 62 and boundary loss 63 ). However, commonly-used CE loss, DC loss, and Combo loss could only evaluate pixel-wise similarity without considering the relative positional relationship between the region pixel and object boundary, resulting in ineffective recognition for pixels near the boundary.…”
Section: Introductionmentioning
confidence: 99%
“…Karimi et al 62 proposed a Hausdorff distance-based loss function and reduced the segmentation errors for medical image segmentation. Bokhovkin et al 63 adapted the boundary F1 score metric 66 as the boundary loss for remote sensing imagery segmentation. Inspired by the conventional active contour model, 67 Chen et al 68 integrated the contour line length and curvature of the predicted target boundary as geometrical constraints for image segmentation tasks.…”
Deep‐learning‐based automatic recognition of post‐earthquake damage for urban buildings is increasingly in demand for rapid and precise assessment of seismic hazards from optical remote sensing images. In this study, a novel loss function fusing geometric consistency constraint (GCC) with cross‐entropy (CE) loss is designed for post‐earthquake building segmentation with complex geometric features across multiple scales. Specifically, the GCC loss incorporates three critical components, namely, split line length, curvature, and area, and enables the exact extraction of the geometric constraints of boundary and region for damaged buildings. Through the optimization of multiple key coefficients of GCC loss, the proposed method achieves significant performance improvements in semantic segmentation, which is attributed to the enhanced ability to identify and capture the pixel relationship near the boundary. Merging GCC in the loss function enables faster and more accurate convergence of predicted values towards the ground truth during the training process, surpassing the performance of the CE loss alone. The results show that the combination of GCC and CE losses achieves the largest validation mIoU of 86.98% for damaged buildings segmentation, which facilitates post‐earthquake assessment with high accuracy. Moreover, incorporating GCC leads to more precise and robust seismic damage segmentation by effectively improving edge closure, removing internal noise, and reducing false‐positive and false‐negative misrecognition. In addition, the GCC term further validates the effectiveness of improving segmentation tasks for other networks (e.g., DeepLabv3+). The GCC‐derived method exhibits its desirable performance on segmentation accuracy, portability, and universality for building recognition with complex geometric features and post‐earthquake scenes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.