“…In this paper, we presented a new end-to-end spatiotemporal hybrid architecture, ResNet+TCN, for determining the level of engagement among students in an online feature extraction [7], (d) C3D averaging + LSTM [30], (e) I3D [16], (f) ResNet + TCN with sampling and weighted loss (proposed), (g) C3D + LSTM [30], (h) LRCN [23], (i) C3D fine tuning [22], (j) DFSTN [24], (k) C3D + TCN (proposed), (l) DERN [13], (m) ResNet + LSTM (proposed), (n) ResNet + TCN (proposed). classroom setting.…”