“…We recently analyzed (Chen, Leong, Feng, Lee, & Somasundaran, 2015;Chen et al, 2014;Ramanarayanan, Chen, Leong, Feng, & Suendermann-Oeft, 2015) how fusing features obtained from different multimodal data streams, such as speech, face, body movement, and emotion tracks, can be applied to the scoring of multimodal presentations. We analyzed multimodal data collected by Chen et al (2015) consisting of synchronized and preprocessed recordings from 56 sessions involving speakers giving presentations on different topics.…”