“…As in MultiMediate '21 [37], our challenge is based on the MPIIGroupInteraction dataset [38,39]. This dataset has served as a basis for diverse tasks, including emergent leadership detection [35], eye contact detection [18,30,38], next speaker prediction [9], backchannel analysis [1,52], and body language detection [3]. The MPIIGroupInteraction corpus consists of 22 group discussions between three to four people, each lasting for 20 minutes [39].…”