Background Large and complex studies are now routine, and quality assurance and quality control (QC) procedures ensure reliable results and conclusions. Standard procedures may comprise manual verification and double entry, but these labour-intensive methods often leave errors undetected. Outlier detection uses a data-driven approach to identify patterns exhibited by the majority of the data and highlights data points that deviate from these patterns. Univariate methods consider each variable independently, so observations that appear odd only when two or more variables are considered simultaneously remain undetected. We propose a data quality evaluation process that emphasizes the use of multivariate outlier detection for identifying errors, and show that univariate approaches alone are insufficient. Further, we establish an iterative process that uses multiple multivariate approaches, communication between teams, and visualization for other large-scale projects to follow. Methods We illustrate this process with preliminary neuropsychology and gait data for the vascular cognitive impairment cohort from the Ontario Neurodegenerative Disease Research Initiative, a multi-cohort observational study that aims to characterize biomarkers within and between five neurodegenerative diseases. Each dataset was evaluated four times: with and without covariate adjustment using two validated multivariate methods – Minimum Covariance Determinant (MCD) and Candès’ Robust Principal Component Analysis (RPCA) – and results were assessed in relation to two univariate methods. Outlying participants identified by multiple multivariate analyses were compiled and communicated to the data teams for verification. Results Of 161 and 148 participants in the neuropsychology and gait data, 44 and 43 were flagged by one or both multivariate methods and errors were identified for 8 and 5 participants, respectively. MCD identified all participants with errors, while RPCA identified 6/8 and 3/5 for the neuropsychology and gait data, respectively. Both outperformed univariate approaches. Adjusting for covariates had a minor effect on the participants identified as outliers, though did affect error detection. Conclusions Manual QC procedures are insufficient for large studies as many errors remain undetected. In these data, the MCD outperforms the RPCA for identifying errors, and both are more successful than univariate approaches. Therefore, data-driven multivariate outlier techniques are essential tools for QC as data become more complex. Electronic supplementary material The online version of this article (10.1186/s12874-019-0737-5) contains supplementary material, which is available to authorized users.
As large research initiatives designed to generate big data on clinical cohorts become more common, there is an increasing need to establish standard quality assurance (QA; preventing errors) and quality control (QC; identifying and correcting errors) procedures for critical outcome measures. The present article describes the QA and QC approach developed and implemented for the neuropsychology data collected as part of the Ontario Neurodegenerative Disease Research Initiative study. We report on the efficacy of our approach and provide data quality metrics. Our findings demonstrate that even with a comprehensive QA protocol, the proportion of data errors still can be high. Additionally, we show that several widely used neuropsychological measures are particularly susceptible to error. These findings highlight the need for large research programs to put into place active, comprehensive, and separate QA and QC procedures before, during, and after protocol deployment. Detailed recommendations and considerations for future studies are provided.
Background Remote health monitoring with wearable sensor technology may positively impact patient self-management and clinical care. In individuals with complex health conditions, multi-sensor wear may yield meaningful information about health-related behaviors. Despite available technology, feasibility of device-wearing in daily life has received little attention in persons with physical or cognitive limitations. This mixed methods study assessed the feasibility of continuous, multi-sensor wear in persons with cerebrovascular (CVD) or neurodegenerative disease (NDD). Methods Thirty-nine participants with CVD, Alzheimer’s disease/amnestic mild cognitive impairment, frontotemporal dementia, Parkinson’s disease, or amyotrophic lateral sclerosis (median age 68 (45–83) years, 36% female) wore five devices (bilateral ankles and wrists, chest) continuously for a 7-day period. Adherence to device wearing was quantified by examining volume and pattern of device removal (non-wear). A thematic analysis of semi-structured de-brief interviews with participants and study partners was used to examine user acceptance. Results Adherence to multi-sensor wear, defined as a minimum of three devices worn concurrently, was high (median 98.2% of the study period). Non-wear rates were low across all sensor locations (median 17–22 min/day), with significant differences between some locations ( p = 0.006). Multi-sensor non-wear was higher for daytime versus nighttime wear ( p < 0.001) and there was a small but significant increase in non-wear over the collection period ( p = 0.04). Feedback from de-brief interviews suggested that multi-sensor wear was generally well accepted by both participants and study partners. Conclusion A continuous, multi-sensor remote health monitoring approach is feasible in a cohort of persons with CVD or NDD.
Genetic factors contribute to neurodegenerative diseases, with high heritability estimates across diagnoses; however, a large portion of the genetic influence remains poorly understood. Many previous studies have attempted to fill the gaps by performing linkage analyses and association studies in individual disease cohorts, but have failed to consider the clinical and pathological overlap observed across neurodegenerative diseases and the potential for genetic overlap between the phenotypes. Here, we leveraged rare variant association analyses (RVAAs) to elucidate the genetic overlap among multiple neurodegenerative diagnoses, including Alzheimer’s disease, amyotrophic lateral sclerosis, frontotemporal dementia (FTD), mild cognitive impairment, and Parkinson’s disease (PD), as well as cerebrovascular disease, using the data generated with a custom-designed neurodegenerative disease gene panel in the Ontario Neurodegenerative Disease Research Initiative (ONDRI). As expected, only ~3% of ONDRI participants harboured a monogenic variant likely driving their disease presentation. Yet, when genes were binned based on previous disease associations, we observed an enrichment of putative loss of function variants in PD genes across all ONDRI cohorts. Further, individual gene-based RVAA identified significant enrichment of rare, nonsynonymous variants in PARK2 in the FTD cohort, and in NOTCH3 in the PD cohort. The results indicate that there may be greater heterogeneity in the genetic factors contributing to neurodegeneration than previously appreciated. Although the mechanisms by which these genes contribute to disease presentation must be further explored, we hypothesize they may be a result of rare variants of moderate phenotypic effect contributing to overlapping pathology and clinical features observed across neurodegenerative diagnoses.
Introduction: Understanding synergies between neurodegenerative and cerebrovascular pathologies that modify dementia presentation represents an important knowledge gap.Methods: This multi-site, longitudinal, observational cohort study recruited participants across prevalent neurodegenerative diseases and cerebrovascular disease and assessed participants comprehensively across modalities. We describe univariate and multivariate baseline features of the cohort and summarize recruitment, data collection, and curation processes. Results:We enrolled 520 participants across five neurodegenerative and cerebrovascular diseases. Median age was 69 years, median Montreal Cognitive Assessment score was 25, median independence in activities of daily living was 100% for basic and 93% for instrumental activities. Spousal study partners predominated; participants were often male, White, and more educated. Milder disease stages predominated, yet cohorts reflect clinical presentation.
Objective: In individuals over the age of 65, concomitant neurodegenerative pathologies contribute to cognitive and/or motor decline and can be aggravated by cerebrovascular disease, but our understanding of how these pathologies synergize to produce the decline represents an important knowledge gap. The Ontario Neurodegenerative Disease Research Initiative (ONDRI), a multi-site, longitudinal, observational cohort study, recruited participants across multiple prevalent neurodegenerative diseases and cerebrovascular disease, collecting a wide array of data and thus allowing for deep investigation into common and unique phenotypes. This paper describes baseline features of the ONDRI cohort, understanding of which is essential when conducting analyses or interpreting results. Methods: Five disease cohorts were recruited: Alzheimer's disease/amnestic mild cognitive impairment (AD/MCI), amyotrophic lateral sclerosis (ALS), frontotemporal dementia (FTD), Parkinson's disease (PD), and cerebrovascular disease (CVD). Assessment platforms included clinical, neuropsychology, eye tracking, gait and balance, neuroimaging, retinal imaging, genomics, and pathology. We describe recruitment, data collection, and data curation protocols, and provide a summary of ONDRI baseline characteristics. Results: 520 participants were enrolled. Most participants were in the early stages of disease progression. Participants had a median age of 69 years, a median Montreal Cognitive Assessment score of 25, a median percent of independence of 100 for basic activities of daily living, and a median of 93 for instrumental activities. Variation between disease cohorts existed for age, level of cognition, and geographic location. Conclusion: ONDRI data will enable exploration into unique and shared pathological mechanisms contributing to cognitive and motor decline across the spectrum of neurodegenerative diseases.
The minimum covariance determinant (MCD) algorithm is one of the most common techniques to detect anomalous or outlying observations. The MCD algorithm depends on two features of multivariate data: the determinant of a matrix (i.e., geometric mean of the eigenvalues) and Mahalanobis distances (MD). While the MCD algorithm is commonly used, and has many extensions, the MCD is limited to analyses of quantitative data and more specifically data assumed to be continuous. One reason why the MCD does not extend to other data types such as categorical or ordinal data is because there is not a well-defined MD for data types other than continuous data. To address the lack of MCD-like techniques for categorical or mixed data we present a generalization of the MCD. To do so, we rely on a multivariate technique called correspondence analysis (CA). Through CA we can define MD via singular vectors and we can compute the determinant from CA's eigenvalues. Here we define and illustrate a generalized MCD on categorical data and then show how our generalized MCD extends beyond categorical data to accomodate mixed data types (e.g., categorical, ordinal, and continuous). We illustrate this generalized MCD on data from two large scale projects: the Ontario Neurodegenerative Disease Research Initiative (ONDRI) and the Alzheimer's Disease Neuroimaging Initiative (ADNI) with data such as genetics (categorical), clinical instruments and surveys (categorical or ordinal), and neuroimaging (continuous) data. We also make R code and toy data available in order to illustrate our generalized MCD.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.