The availability of smartphone and wearable sensor technology is leading to a rapid accumulation of human subject data, and machine learning is emerging as a technique to map those data into clinical predictions. As machine learning algorithms are increasingly used to support clinical decision making, it is vital to reliably quantify their prediction accuracy. Cross-validation (CV) is the standard approach where the accuracy of such algorithms is evaluated on part of the data the algorithm has not seen during training. However, for this procedure to be meaningful, the relationship between the training and the validation set should mimic the relationship between the training set and the dataset expected for the clinical use. Here we compared two popular CV methods: record-wise and subject-wise. While the subject-wise method mirrors the clinically relevant use-case scenario of diagnosis in newly recruited subjects, the record-wise strategy has no such interpretation. Using both a publicly available dataset and a simulation, we found that record-wise CV often massively overestimates the prediction accuracy of the algorithms. We also conducted a systematic review of the relevant literature, and found that this overly optimistic method was used by almost half of the retrieved studies that used accelerometers, wearable sensors, or smartphones to predict clinical outcomes. As we move towards an era of machine learning-based diagnosis and treatment, using proper methods to evaluate their accuracy is crucial, as inaccurate results can mislead both clinicians and data scientists.
Machine learning algorithms that use data streams captured from soft wearable sensors have the potential to automatically detect PD symptoms and inform clinicians about the progression of disease. However, these algorithms must be trained with annotated data from clinical experts who can recognize symptoms, and collecting such data are costly. Understanding how many sensors and how much labeled data are required is key to successfully deploying these models outside of the clinic. Here we recorded movement data using 6 flexible wearable sensors in 20 individuals with PD over the course of multiple clinical assessments conducted on 1 day and repeated 2 weeks later. Participants performed 13 common tasks, such as walking or typing, and a clinician rated the severity of symptoms (bradykinesia and tremor). We then trained convolutional neural networks and statistical ensembles to detect whether a segment of movement showed signs of bradykinesia or tremor based on data from tasks performed by other individuals. Our results show that a single wearable sensor on the back of the hand is sufficient for detecting bradykinesia and tremor in the upper extremities, whereas using sensors on both sides does not improve performance. Increasing the amount of training data by adding other individuals can lead to improved performance, but repeating assessments with the same individuals—even at different medication states—does not substantially improve detection across days. Our results suggest that PD symptoms can be detected during a variety of activities and are best modeled by a dataset incorporating many individuals.
Objective: Controlling the spread of the COVID-19 pandemic largely depends on scaling up the testing infrastructure for identifying infected individuals. Consumer-grade wearables may present a solution to detect the presence of infections in the population, but the current paradigm requires collecting physiological data continuously and for long periods of time on each individual, which poses limitations in the context of rapid screening. Technology: Here, we propose a novel paradigm based on recording the physiological responses elicited by a short (~2 minutes) sequence of activities (i.e. “snapshot”), to detect symptoms associated with COVID-19. We employed a novel body-conforming soft wearable sensor placed on the suprasternal notch to capture data on physical activity, cardio-respiratory function, and cough sounds. Results: We performed a pilot study in a cohort of individuals (n=14) who tested positive for COVID-19 and detected altered heart rate, respiration rate and heart rate variability, relative to a group of healthy individuals (n=14) with no known exposure. Logistic regression classifiers were trained on individual and combined sets of physiological features (heartbeat and respiration dynamics, walking cadence, and cough frequency spectrum) at discriminating COVID-positive participants from the healthy group. Combining features yielded an AUC of 0.94 (95% CI=[0.92, 0.96]) using a leave-one-subject-out cross validation scheme. Conclusions and Clinical Impact: These results, although preliminary, suggest that a sensor-based snapshot paradigm may be a promising approach for non-invasive and repeatable testing to alert individuals that need further screening.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.