Personalized medicine promises individualized disease prediction and treatment. The convergence of machine learning (ML) and available multimodal data is key moving forward. We build upon previous work to deliver multimodal predictions of Parkinson’s disease (PD) risk and systematically develop a model using GenoML, an automated ML package, to make improved multi-omic predictions of PD, validated in an external cohort. We investigated top features, constructed hypothesis-free disease-relevant networks, and investigated drug–gene interactions. We performed automated ML on multimodal data from the Parkinson’s progression marker initiative (PPMI). After selecting the best performing algorithm, all PPMI data was used to tune the selected model. The model was validated in the Parkinson’s Disease Biomarker Program (PDBP) dataset. Our initial model showed an area under the curve (AUC) of 89.72% for the diagnosis of PD. The tuned model was then tested for validation on external data (PDBP, AUC 85.03%). Optimizing thresholds for classification increased the diagnosis prediction accuracy and other metrics. Finally, networks were built to identify gene communities specific to PD. Combining data modalities outperforms the single biomarker paradigm. UPSIT and PRS contributed most to the predictive power of the model, but the accuracy of these are supplemented by many smaller effect transcripts and risk SNPs. Our model is best suited to identifying large groups of individuals to monitor within a health registry or biobank to prioritize for further testing. This approach allows complex predictive models to be reproducible and accessible to the community, with the package, code, and results publicly available.
Personalized medicine promises individualized disease prediction and treatment. The convergence of machine learning (ML) and available multi-modal data is key moving forward. We build upon previous work to deliver multi-modal predictions of Parkinsons Disease (PD). We performed automated ML on multi-modal data from the Parkinsons Progression Marker Initiative (PPMI). After selecting the best performing algorithm, all PPMI data was used to tune the selected model. The model was validated in the Parkinsons Disease Biomarker Program (PDBP) dataset. Finally, networks were built to identify gene communities specific to PD. Our initial model showed an area under the curve (AUC) of 89.72% for the diagnosis of PD. The tuned model was then tested for validation on external data (PDBP, AUC 85.03%). Optimizing thresholds for classification, increased the diagnosis prediction accuracy (balanced accuracy) and other metrics. Combining data modalities outperforms the single biomarker paradigm. UPSIT was the largest contributing predictor for the classification of PD. The transcriptomic data was used to construct a network of disease-relevant transcripts. We have built a model using an automated ML pipeline to make improved multi-omic predictions of PD. The model developed improves disease risk prediction, a critical step for better assessment of PD risk. We constructed gene expression networks for the next generation of genomics-derived interventions. Our automated ML approach allows complex predictive models to be reproducible and accessible to the community.
Coding and non-coding RNAs have diagnostic and prognostic importance in Parkinson's diseases (PD). We studied circulating small non-coding RNAs (sncRNAs) in 7, 003 samples from two longitudinal PD cohorts (Parkinson's Progression Marker Initiative (PPMI) and Luxembourg Parkinson's Study (NCER-PD)) and modelled their influence on the transcriptome. First, we sequenced sncRNAs in 5, 450 blood samples of 1, 614 individuals in PPMI. The majority of 323 billion reads (59 million reads per sample) mapped to miRNAs. Other covered RNA classes include piRNAs, rRNAs, snoRNAs, tRNAs, scaRNAs, and snRNAs. De-regulated miRNAs were associated with the disease and disease progression and occur in two distinct waves in the third and seventh decade of live. Originating mostly from a characteristic set of immune cells they resemble a systemic inflammation response and mitochondrial dysfunction, two hallmarks of PD. By profiling 1, 553 samples from 1, 024 individuals in the NCER-PD cohort using an independent technology, we validate relevant findings from the sequencing study. Finally, network analysis of sncRNAs and transcriptome sequencing of the original cohort identified regulatory modules emerging in progressing PD patients.miRNA studies to date, covering over 3, 000 patients and controls 21 .Advanced biomarker studies however require carefully designed cohorts. Already a few large-scale PD studies aiming to advance diagnosis, prognosis and therapeutics fulfil respective requirements 22-24 . Among them, the Parkinson's Progression Marker Initiative (PPMI) is a multi-cohort, longitudinal observational study designed to discover and validate objective biomarkers of Parkinson's 25 . The PPMI project constitutes a global effort of 33 clinical sites in 11 countries with regular study participant assessments (Figure 1a). It also features comprehensive clinical phenotyping to observe hundreds of characteristics of the known subtypes of the disease, such as the idiopathic and genetic forms. Further, longitudinal biosampling following rigorous Standard Operating Procedures (SOPs) is performed to set a framework for the discovery and validation of early-onset and prognostic biomarkers. To identify potential non-coding RNA and transcriptomic markers in PPMI we performed RNA-seq on blood samples drawn at each clinical visit. For short and long RNAs, we carried out optimized assays and sequenced separate aliquots from the same blood samples for paired RNA analyses. Here, we present the evaluation of the sncRNA-seq fraction for disease detection and progression tracking. We examine the potential of different classes of small RNAs but emphasise the role of miRNAs. We also validate relevant findings for miRNAs on the Luxembourg Parkinson's Study in the framework of the National Centre for Excellence in Research on PD (NCER-PD) cohort 22 , which was performed independently and with a different technology. Finally, we provide insights on how the key non-coding RNAs regulate gene expression by utilizing the long RNA sequencing data. Specific ana...
The Michael J. Fox Foundation’s Parkinson’s Progression Markers Initiative (PPMI) is an observational study to comprehensively evaluate Parkinson’s disease (PD) patients using imaging, biologic sampling, clinical and behavioural assessments to identify biomarkers of PD progression. As part of this study, we obtained 4,756 whole blood samples from 1,570 subjects at baseline, 0.5, 1, 2, and 3 years from enrollment in the study. We isolated RNA and performed whole transcriptome sequencing in this longitudinal cohort. Here, we describe and quantify technical variability associated with this dataset through the use of pooled reference samples, including plate distribution, RNA quality, and outliers. This large, uniformly processed dataset is available to researchers at https://www.ppmi-info.org.
In the FOUNdational Data INitiative for Parkinsons Disease (FOUNDIN-PD) we sought to produce a multi-layered molecular dataset in a large cohort of 95 Induced pluripotent stem cells (iPSC) lines at multiple timepoints during differentiation to dopaminergic (DA) neurons, a major affected cell type in Parkinsons Disease (PD). The lines are derived from the Parkinsons Progression Markers Initiative (PPMI) study that includes both people with PD and unaffected individuals across a wide range of polygenic risk scores (PRS) with both risk variants identified by genome-wide association studies (GWAS), and monogenic causal alleles. We generated genetic, epigenetic, regulatory, transcriptomic, proteomic, and longitudinal cellular imaging data from iPSC-derived DA neurons to understand key molecular relationships between disease associated genetic variation and proximate molecular events in a PD relevant cell-type. Analyses of all data modalities collected in FOUNDIN-PD suggest that the differentiation to DA neurons, while not fully mature, was successful and robust. Interrogation of PD genetic risk in this relevant cellular context may elucidate the functional effects of some of these risk variants alone or in combination with other variants. These data reveal that DA neurons derived from human iPSC provide a valuable cellular context and foundational atlas for modeling PD-related genetic risk. In addition to making the data and analyses for this molecular atlas readily available, we have integrated these data into the browsable FOUNDIN-PD data portal (https://www.foundinpd.org) to be used as a resource for understanding the molecular pathogenesis of PD.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.