Technological advances, lack of medical professionals, high cost of face-to face encounters and disasters such as COVID19 pandemic, fuel the telemedicine revolution. Numerous smartphone apps have been developed to measure neurological functions. However, their psychometric properties are seldom determined. Lacking such data, it is unclear which designs underlie eventual clinical utility of the smartphone tests.
We have developed the smartphone Neurological Function Tests Suite (NeuFun-TS) and are systematically evaluating their psychometric properties against the gold-standard of complete neurological examination digitalized into NeurExTM App. This paper examines the fifth, and thus far the most complex NeuFun-TS test, the "Spiral tracing". We generated 40 features in the training cohort (22 healthy donors [HD] and 105 multiple sclerosis [MS] patients) and compared their intraclass correlation coefficient, fold-change between HD and MS and correlations with relevant clinical and imaging outcomes. We assembled the best features into machine-learning models and examined their performance in the independent validation cohort (56 MS patients).
We show that by aggregating multiple neurological functions, complex tests such as spiral tracing are susceptible to intra-individual variations, decreasing their reproducibility and thus, clinical utility. Simple tests, reproducibly measuring single function(s) that can be aggregated to increase sensitivity are preferable in app design.