“…Identifying Numerous studies have been undertaken to address the issue of speaker role identification. These efforts involve utilising text-based features (Barzilay et al, 2000;Liu, 2006;Wang et al, 2011;Sapru and Valente, 2012;Flemotomos et al, 2019), or employing multimodal approaches that integrate both text and audio features (Rouvier et al, 2015;Bellagha and Zrigui, 2020;Guo et al, 2023). In both cases, the goal is to classify each speaker in a conversation into a predefined role category.…”