2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.00975
|View full text |Cite
|
Sign up to set email alerts
|

DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation

Abstract: This paper introduces an extremely efficient CNN architecture named DFANet for semantic segmentation under resource constraints. Our proposed network starts from a single lightweight backbone and aggregates discriminative features through sub-network and sub-stage cascade respectively. Based on the multi-scale feature propagation, DFANet substantially reduces the number of parameters, but still obtains sufficient receptive field and enhances the model learning ability, which strikes a balance between the speed… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
250
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 486 publications
(250 citation statements)
references
References 40 publications
0
250
0
Order By: Relevance
“…In the inference phase, we use the training dataset and validation dataset to train our model with 960 × 720 resolution input. Our models are compared to some non-real-time algorithms, including SegNet (Badrinarayanan et al, 2017), Deeplab (Chen et al, 2015), RTA (Huang et al, 2018), Dilate8 (Yu and Koltun, 2016), PSPNet (Zhao et al, 2017), VideoGCRF (Chandra et al, 2018), and DenseDecoder (Bilinski and Prisacariu, 2018), and real-time algorithms, containing ENet (Paszke et al, 2016), IC-Net (Zhao et al, 2018a), DABNet (Li et al, 2019a), DFANet (Li et al, 2019b), SwiftNet (Orsic et al, 2019), BiSeNetV1 (Yu et al, 2018a). BiSeNetV2 achieves much faster inference speed than other methods.…”
Section: Performance Evaluationmentioning
confidence: 99%
“…In the inference phase, we use the training dataset and validation dataset to train our model with 960 × 720 resolution input. Our models are compared to some non-real-time algorithms, including SegNet (Badrinarayanan et al, 2017), Deeplab (Chen et al, 2015), RTA (Huang et al, 2018), Dilate8 (Yu and Koltun, 2016), PSPNet (Zhao et al, 2017), VideoGCRF (Chandra et al, 2018), and DenseDecoder (Bilinski and Prisacariu, 2018), and real-time algorithms, containing ENet (Paszke et al, 2016), IC-Net (Zhao et al, 2018a), DABNet (Li et al, 2019a), DFANet (Li et al, 2019b), SwiftNet (Orsic et al, 2019), BiSeNetV1 (Yu et al, 2018a). BiSeNetV2 achieves much faster inference speed than other methods.…”
Section: Performance Evaluationmentioning
confidence: 99%
“…In addition to the aforementioned networks, many practical deep learning techniques (e.g., Spatial Pyramid Pooling [59], CRF-RNN [60], Batch Normalization [61], Dropout [62]) were proposed for improving the effectiveness of learning models. Notably, multi-scale feature aggregation was frequently used in semantic segmentation [63,64,65,66,67]. These learning models experimentally achieve significant performance improvement.…”
Section: Semantic Image Segmentationmentioning
confidence: 99%
“…Finally, a feature fusion module was designed to merge the two features effectively. A threelevel feature extraction network was designed by DFANet [18], which fully promoted the interaction and aggregation of feature information at different levels while making sure that the computational burden was small. On the other hand, a novel multi-feature fusion module was used by MSFNet [19] to strengthen the information flow among layers.…”
Section: A Real-time Semantic Segmentationmentioning
confidence: 99%
“…The batch size is 8, and the Adam optimizer is used to train the model with the weight decay of 1 × 10 −4 . Based on some previous work [15][16][17][18], we also adopt the "poly" learning rate strategy, as shown below:…”
Section: B the Experimental Detailsmentioning
confidence: 99%