ASFD: Automatic and Scalable Face Detector

Li, Jian; Zhang, Bin; Wang, Yabiao; Tai, Ying; Zhang, Zhenyu; Wang, Chengjie; Li, Jilin; Huang, Xiaoming; Xia, Yili

doi:10.1145/3474085.3475372

Cited by 5 publications

(4 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Video-level Distribution Consistency Strategy. Contrary to the general TAL [49], [50], [51], [52], expression spotting heavily relies on the sample distribution of different classes [10], [11]. We observe that the distribution of MEs in long, untrimmed face videos is sparser than that of MaEs on the CAS(ME) 2 [10] and SAMM-LV [11] datasets, because MEs are more challenging to evoke than MaEs [13].…”

Section: Multi-level Consistency Analysismentioning

confidence: 80%

“…Utilizing weak labels to train models has come a long way in computer vision such as semantic segmentation [44], [45], [46], object detection [47], [48], and temporal action localization (TAL) [21], [22], [23]. In contrast to the fullysupervised TAL [49], [50], [51], [52], the WTAL methods are free of extensive frame-level annotations and adopt video- [23], [53], [54], [55], [56] or point (key frame)-level [57], [58], [59], [60] labels during training. Since different videolevel WTAL approaches have different emphases, we can categorize them as foreground-only, background-assisted or pseudo-label-guided.…”

Section: Weakly-supervised Temporal Action Localizationmentioning

confidence: 99%

See 1 more Smart Citation

LGSNet: A Two-Stream Network for Micro- and Macro-Expression Spotting With Background Modeling

Jiang

Yang

et al. 2024

IEEE Trans. Affective Comput.

View full text Add to dashboard Cite

Most micro-and macro-expression spotting methods in untrimmed videos suffer from the burden of video-wise collection and frame-wise annotation. Weakly-supervised expression spotting (WES) based on video-level labels can potentially mitigate the complexity of frame-level annotation while achieving fine-grained frame-level spotting. However, we argue that existing weakly-supervised methods are based on multiple instance learning (MIL) involving inter-modality, inter-sample, and inter-task gaps. The inter-sample gap is primarily from the sample distribution and duration. Therefore, we propose a novel and simple WES framework, MC-WES, using multi-consistency collaborative mechanisms that include modal-level saliency, video-level distribution, label-level duration and segment-level feature consistency strategies to implement fine frame-level spotting with only video-level labels to alleviate the above gaps and merge prior knowledge. The modal-level saliency consistency strategy focuses on capturing key correlations between raw images and optical flow. The video-level distribution consistency strategy utilizes the difference of sparsity in temporal distribution. The label-level duration consistency strategy exploits the difference in the duration of facial muscles. The segment-level feature consistency strategy emphasizes that features under the same labels maintain similarity. Experimental results on two challenging datasets-CAS(ME) 2 and SAMM-LV-demonstrate that MC-WES is comparable to state-of-the-art fully-supervised methods.

show abstract

Section: Multi-level Consistency Analysismentioning

confidence: 80%

Section: Weakly-supervised Temporal Action Localizationmentioning

confidence: 99%

LGSNet: A Two-Stream Network for Micro- and Macro-Expression Spotting With Background Modeling

Jiang

Yang

et al. 2024

IEEE Trans. Affective Comput.

View full text Add to dashboard Cite

show abstract

“…We conducted comparative experiments on RetinaFace [ 33 ], LFFD [ 13 ], CenterFace [ 20 ], ASFD [ 34 ], and the proposed GA-Face network on the FDDB dataset and WiderFace validation set. In Table 2 , GA-Face (ghost only) represents removing the SimAM module from the attention enhancement structure ( Figure 2 c) of GA-Face.…”

Section: Methodsmentioning

confidence: 99%

“…GA-Face is trained for 120 epochs on the WiderFace training set, the network weights are optimized using the Adam optimizer, a learning rate of 0.0005 is achieved, and the learning rate is reduced by 10 times when training reaches the 70th and 100th epochs. We conducted comparative experiments on RetinaFace [33], LFFD [13], CenterFace [20], ASFD [34], and the proposed GA-Face network on the FDDB dataset and WiderFace validation set. In Table 2, GA-Face (ghost only) represents removing the SimAM module from the attention enhancement structure (Figure 2c) of GA-Face.…”

Section: Experiments 41 Effectiveness Evaluation Of Face Detection Ne...mentioning

confidence: 99%

Online Learning State Evaluation Method Based on Face Detection and Head Pose Estimation

Li,

Liu

2024

Sensors

View full text Add to dashboard Cite

In this paper, we propose a learning state evaluation method based on face detection and head pose estimation. This method is suitable for mobile devices with weak computing power, so it is necessary to control the parameter quantity of the face detection and head pose estimation network. Firstly, we propose a ghost and attention module (GA) base face detection network (GA-Face). GA-Face reduces the number of parameters and computation in the feature extraction network through the ghost module, and focuses the network on important features through a parameter-free attention mechanism. We also propose a lightweight dual-branch (DB) head pose estimation network: DB-Net. Finally, we propose a student learning state evaluation algorithm. This algorithm can evaluate the learning status of students based on the distance between their faces and the screen, as well as their head posture. We validate the effectiveness of the proposed GA-Face and DB-Net on several standard face detection datasets and standard head pose estimation datasets. Finally, we validate, through practical cases, that the proposed online learning state assessment method can effectively assess the level of student attention and concentration, and, due to its low computational complexity, will not interfere with the student’s learning process.

show abstract

EfficientSRFace: An Efficient Network with Super-Resolution Enhancement for Accurate Face Detection

Wang,

Li,

Xie

et al. 2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

ASFD: Automatic and Scalable Face Detector

Cited by 5 publications

References 34 publications

LGSNet: A Two-Stream Network for Micro- and Macro-Expression Spotting With Background Modeling

LGSNet: A Two-Stream Network for Micro- and Macro-Expression Spotting With Background Modeling

Online Learning State Evaluation Method Based on Face Detection and Head Pose Estimation

EfficientSRFace: An Efficient Network with Super-Resolution Enhancement for Accurate Face Detection

Contact Info

Product

Resources

About