2019
DOI: 10.1038/s41746-019-0178-x
|View full text |Cite
|
Sign up to set email alerts
|

Detecting the impact of subject characteristics on machine learning-based diagnostic applications

Abstract: Collection of high-dimensional, longitudinal digital health data has the potential to support a wide-variety of research and clinical applications including diagnostics and longitudinal health tracking. Algorithms that process these data and inform digital diagnostics are typically developed using training and test sets generated from multiple repeated measures collected across a set of individuals. However, the inclusion of repeated measurements is not always appropriately taken into account in the analytical… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

4
39
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
1

Relationship

3
5

Authors

Journals

citations
Cited by 55 publications
(43 citation statements)
references
References 23 publications
4
39
0
Order By: Relevance
“…The data was then split record-wise into two sets; one was a training data set (70%; n = 2442 steps) and the other a validation ("test" or "hold out") set (30%; n = 1047 steps). This was done to avoid model under-fitting and high classification errors [40][41][42].…”
Section: Modeling Approachesmentioning
confidence: 99%
“…The data was then split record-wise into two sets; one was a training data set (70%; n = 2442 steps) and the other a validation ("test" or "hold out") set (30%; n = 1047 steps). This was done to avoid model under-fitting and high classification errors [40][41][42].…”
Section: Modeling Approachesmentioning
confidence: 99%
“…A recent literature review of mobile health classification studies demonstrated that 47% had artificially inflated the performance of their measures through failure to account for the identity of individual data points . Our own quantification of this effect across three studies showed that identity confounding can be many times larger than the effect of the condition that was being studied . As with the analytical issues described above, proper interpretation of analyses using mobile health studies for classification requires reporting of how repeat measures were handled.…”
Section: Application and Consequences Of Longitudinal Samplingmentioning
confidence: 95%
“…Analysis of simulated 13 and empirical data 15 suggest that classi ers trained and evaluated using recordwise data splits can pick confounding relationship between identity and group and so produce in ated accuracies. To address this concern in our data, we contrasted record-wise data splits, where repeated measurements from the same individual are assigned to both the training set and the test set, with subject-wise data splits, where measures of each subject are assigned to either the training set or test set, neutralizing any potential identity confounding 13 .…”
Section: Group Classi Cationmentioning
confidence: 99%
“…To more directly test whether record-wise classi ers learn identity information rather than group information, we next randomly permuted the diagnostic labels of each subject as a block, so that all records of a given subject were assigned either ASD or TD during the permutation process 15,18 . The motivation for this relabelling scheme is that it preserves the confounding association between group and identity, while breaking the relationship between movement data and group.…”
Section: Quanti Cation Of Identity Confoundingmentioning
confidence: 99%