Proceedings of the 29th ACM International Conference on Multimedia 2021
DOI: 10.1145/3474085.3475372
|View full text |Cite
|
Sign up to set email alerts
|

ASFD: Automatic and Scalable Face Detector

Abstract: Along with current multi-scale based detectors, Feature Aggregation and Enhancement (FAE) modules have shown superior performance gains for cutting-edge object detection. However, these hand-crafted FAE modules show inconsistent improvements on face detection, which is mainly due to the significant distribution difference between its training and applying corpus, i.e. COCO vs. WIDER Face. To tackle this problem, we essentially analyse the effect of data distribution, and consequently propose to search an effec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 34 publications
0
4
0
Order By: Relevance
“…Video-level Distribution Consistency Strategy. Contrary to the general TAL [49], [50], [51], [52], expression spotting heavily relies on the sample distribution of different classes [10], [11]. We observe that the distribution of MEs in long, untrimmed face videos is sparser than that of MaEs on the CAS(ME) 2 [10] and SAMM-LV [11] datasets, because MEs are more challenging to evoke than MaEs [13].…”
Section: Multi-level Consistency Analysismentioning
confidence: 80%
See 1 more Smart Citation
“…Video-level Distribution Consistency Strategy. Contrary to the general TAL [49], [50], [51], [52], expression spotting heavily relies on the sample distribution of different classes [10], [11]. We observe that the distribution of MEs in long, untrimmed face videos is sparser than that of MaEs on the CAS(ME) 2 [10] and SAMM-LV [11] datasets, because MEs are more challenging to evoke than MaEs [13].…”
Section: Multi-level Consistency Analysismentioning
confidence: 80%
“…Utilizing weak labels to train models has come a long way in computer vision such as semantic segmentation [44], [45], [46], object detection [47], [48], and temporal action localization (TAL) [21], [22], [23]. In contrast to the fullysupervised TAL [49], [50], [51], [52], the WTAL methods are free of extensive frame-level annotations and adopt video- [23], [53], [54], [55], [56] or point (key frame)-level [57], [58], [59], [60] labels during training. Since different videolevel WTAL approaches have different emphases, we can categorize them as foreground-only, background-assisted or pseudo-label-guided.…”
Section: Weakly-supervised Temporal Action Localizationmentioning
confidence: 99%
“…We conducted comparative experiments on RetinaFace [ 33 ], LFFD [ 13 ], CenterFace [ 20 ], ASFD [ 34 ], and the proposed GA-Face network on the FDDB dataset and WiderFace validation set. In Table 2 , GA-Face (ghost only) represents removing the SimAM module from the attention enhancement structure ( Figure 2 c) of GA-Face.…”
Section: Methodsmentioning
confidence: 99%
“…GA-Face is trained for 120 epochs on the WiderFace training set, the network weights are optimized using the Adam optimizer, a learning rate of 0.0005 is achieved, and the learning rate is reduced by 10 times when training reaches the 70th and 100th epochs. We conducted comparative experiments on RetinaFace [33], LFFD [13], CenterFace [20], ASFD [34], and the proposed GA-Face network on the FDDB dataset and WiderFace validation set. In Table 2, GA-Face (ghost only) represents removing the SimAM module from the attention enhancement structure (Figure 2c) of GA-Face.…”
Section: Experiments 41 Effectiveness Evaluation Of Face Detection Ne...mentioning
confidence: 99%