“…The latter is considerably less frequent due to its tedious manual annotation process (McCowan et al, 2005;Douglas-Cowie et al, 2007;McKeown et al, 2010;Lücking et al, 2012;Vella and Paggio, 2013;Vandeventer et al, 2015;Naim et al, 2015;Chou et al, 2017;Paggio and Navarretta, 2017;Cafaro et al, 2017;Joo et al, 2019b;Kossaifi et al, 2019;Chen et al, 2020;Khan et al, 2020;. The most frequent low-level annotations that the datasets provide are the participants' body poses and facial expressions (Douglas-Cowie et al, 2007;Rehg et al, 2013;Bilakhia et al, 2015;Vandeventer et al, 2015;Naim et al, 2015;Edwards et al, 2016;Cafaro et al, 2017;Feng et al, 2017;Georgakis et al, 2017;Paggio and Navarretta, 2017;Bozkurt et al, 2017;Andriluka et al, 2018;von Marcard et al, 2018;Mehta et al, 2018;Lemaignan et al, 2018;Joo et al, 2019b;Kossaifi et al, 2019;Schiphorst et al, 2020;Doyran et al, 2021;. Given their annotation complexity, they are usually automatically retrieved with tools like OpenPose (Cao et al, 2019), and manually fixed or discarded.…”