“…In these instances, researchers refer to modalities as multimodal information (16)(17)(18)(19) or simply "data types." This covers audio, video, images, and lyrics (20)(21)(22)(23)(24), as well as extracted features, body motion, physiological measurements (eye gaze, EEG, EDR, ECG, EMG, respiration), content, context (22), symbolic scores (25), MIDI (26), and even depth, thermal, and IMU data (27,28). Sometimes, it features "additional multimodal information" (16), like album covers, video clip links, and expert notes.…”