“…Through a reformulation of the 3D convolution to compute inputs step-by-step rather than spatio-temporally, well-performing 3D CNNs such as X3D (Feichtenhofer, 2020), Slow (Feichtenhofer et al, 2019, and I3D (Carreira & Zisserman, 2017) trained for Trimmed Activity Recognition were re-implemented to execute step-by-step without any re-training. Likewise, Spatio-temporal Graph Convolutional Networks for Skeleton-based Action Recognition (Yan et al, 2018;Shi et al, 2019;Plizzari et al, 2021), which originally operated only on batches, were recently transformed to perform step-wise inference as well though a continual formulation of their Spatiotemporal Graph Convolution blocks (Hedegaard et al, 2022b).…”