We propose a cloud-based multimodal dialog platform for the remote assessment and monitoring of Amyotrophic Lateral Sclerosis (ALS) at scale. This paper presents our vision, technology setup, and an initial investigation of the efficacy of the various acoustic and visual speech metrics automatically extracted by the platform. 82 healthy controls and 54 people with ALS (pALS) were instructed to interact with the platform and completed a battery of speaking tasks designed to probe the acoustic, articulatory, phonatory, and respiratory aspects of their speech. We find that multiple acoustic (rate, duration, voicing) and visual (higher order statistics of the jaw and lip) speech metrics show statistically significant differences between controls, bulbar symptomatic and bulbar pre-symptomatic patients. We report on the sensitivity and specificity of these metrics using five-fold cross-validation. We further conducted a LASSO-LARS regression analysis to uncover the relative contributions of various acoustic and visual features in predicting the severity of patients' ALS (as measured by their self-reported ALSFRS-R scores). Our results provide encouraging evidence of the utility of automatically extracted audiovisual analytics for scalable remote patient assessment and monitoring in ALS.
People spontaneously ascribe intentions on the basis of observed behavior, and research shows that they do this even with simple geometric figures moving in a plane. The latter fact suggests that 2-D animations isolate critical information-object movement-that people use to infer the possible intentions (if any) underlying observed behavior. This article describes an approach to using motion information to model the ascription of intentions to simple figures. Incremental chart parsing is a technique developed in natural-language processing that builds up an understanding as text comes in one word at a time. We modified this technique to develop a system that uses spatiotemporal constraints about simple figures and their observed movements in order to propose candidate intentions or nonagentive causes. Candidates are identified via partial parses using a library of rules, and confidence scores are assigned so that candidates can be ranked. As observations come in, the system revises its candidates and updates the confidence scores. We describe a pilot study demonstrating that people generally perceive a simple animation in a manner consistent with the model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.