“…With the continuous emergence of various media and short videos [ 1 , 2 ] in recent years, the impact on children's emotions [ 3 , 4 ] in daily life is getting bigger and bigger, such as popular music and videos published on YouTube [ 5 , 6 ], TikTok [ 7 , 8 ], and other platforms. Often these data contain three types of modalities, namely, video [ 9 , 10 ], audio [ 11 , 12 ], and text information [ 13 ]. Usually different types of music genres [ 14 , 15 ] often use these three types of modalities to convey their own emotions and values, and children belong to a purely unconscious state and are easily affected by these data.…”