“…A turn (or utterance) in a conversation is each single contribution from a speaker (Schegloff, 1968;Jurafsky and Martin, 2020). The data may be from written conversations, such as the MultiWOZ (Eric et al, 2020), transcripts of human-human spoken conversations, such as the Gothenburg Dialogue Corpus (GDC) (Allwood et al, 2003), crowdsourced conversations, such as the EmpatheticDialogues (Rashkin et al, 2019), and social media conversations like Familjeliv 1 or Reddit 2 (Adewumi et al, 2022c;Adewumi et al, 2022a). As already acknowledged that the amount of data needed for training deep ML models is usually large, they are normally first pretrained on large, unstructured text or conversations before being finetuned on specific conversational data.…”