“…Computer vision predicts the labour activities from the captured video by extracting human body silhouettes (Bai et al, 2012), images contours or shapes, spatiotemporal features (Luo et al, 2018), dense trajectories (Yang et al, 2015(Yang et al, , 2016 and body key joints features (Han et al, 2013;Liu et al, 2017). Since this feature technique interprets fine motions of labour, the labour activities are commonly categorised at level 3 (Ishioka et al, 2020;Yang et al, 2015Yang et al, , 2016. As summarised in Table 3, computer vision-based activity sampling has been conducted onsite for different construction activities such as formwork (Luo et al, 2018), rebar installation (Bai et al, 2012), concreting, carpentry (Liu and Golparvar-Fard, 2015b), scaffolding (Ying et al, 2021), bricklaying (Roberts et al, 2020) and several common construction activities.…”