Proceedings of the 2020 International Conference on Multimodal Interaction 2020
DOI: 10.1145/3382507.3417965
|View full text |Cite
|
Sign up to set email alerts
|

Multi-rate Attention Based GRU Model for Engagement Prediction

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
29
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 27 publications
(29 citation statements)
references
References 5 publications
0
29
0
Order By: Relevance
“…Next, following [86], we utilized OpenFace [6] to extract the facial features including Facial Action Units related features (AUs) [29], Eye Gaze related features, and Head Pose related features (more details about features can be seen in [86]). Then, considering the robustness and efciency in computing, we trained a Random Forest Regressor model with 200 estimators/trees and achieved a 0.05 MSE score on the validation set (comparable with the SOTA models [92,106]).…”
Section: Student End: Learning Status Detectionmentioning
confidence: 99%
“…Next, following [86], we utilized OpenFace [6] to extract the facial features including Facial Action Units related features (AUs) [29], Eye Gaze related features, and Head Pose related features (more details about features can be seen in [86]). Then, considering the robustness and efciency in computing, we trained a Random Forest Regressor model with 200 estimators/trees and achieved a 0.05 MSE score on the validation set (comparable with the SOTA models [92,106]).…”
Section: Student End: Learning Status Detectionmentioning
confidence: 99%
“…In the feature-based engagement detection approaches, firstly, multi-modal handcrafted features are extracted from video/image and then fed to a classifier or regressor to detect the level of engagement in video/image [10], [6], [20], [21]. Wu et al in [20] proposed a feature-based approach for student's engagement level detection in EmotiW dataset [6].…”
Section: Literature Reviewmentioning
confidence: 99%
“…They extracted facial and upper-body features from videos and classified the features using a combination of Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) to detect the level of engagement. Zhu et al in [21] proposed an attention-based GRU model to classify hand-crafted face and body features from videos and detect the level of engagement in the EmotiW dataset [6]. Whitehill et al in [5] proposed different feature-extraction (box filters and Gabor features) and classification (SVM and GentleBoost) combinations to detect the level of engagement of students from single images in their dataset.…”
Section: Literature Reviewmentioning
confidence: 99%
“…This makes the problem non-trivial and subjective because annotators can perceive different engagement levels from the same input video. The reliability of the dataset labels is a big concern in this setting but often is ignored by the current methods [29,30,32]. Because of this, deep learning models overfit to the uncertain samples and perform poorly on validation and test sets.…”
Section: Introductionmentioning
confidence: 99%
“…In our experimental work, we first analyze the importance of feature sets to select the best set of features for the resulting trained ED-MTT system. Then, we compare the performance of ED-MTT with 9 different works [1,5,15,20,24,25,27,31,32] from the state-of-the-art which will be reviewed in the next section. Our results show that ED-MTT outperforms these state-of-the-art methods with at least 6% improvement on MSE.…”
Section: Introductionmentioning
confidence: 99%