“…Talking head generation works can be broadly classified in three categories based on the type of input they use to generate a talking head: Text-driven [16,33,36], Audio-driven [9,13,18,31,37,43,45], and Video-driven [12,27,29,39,44] Talking Head Generation.…”