The growing number of modalities (e.g. multi-omics, imaging and clinical data) characterizing a given disease provides physicians and statisticians with complementary facets reflecting the disease process but emphasizes the need for novel statistical methods of data analysis able to unify these views. Such data sets are indeed intrinsically structured in blocks, where each block represents a set of variables observed on a group of individuals. Therefore, classical statistical tools cannot be applied without altering their organization, with the risk of information loss. Regularized generalized canonical correlation analysis (RGCCA) and its sparse generalized canonical correlation analysis (SGCCA) counterpart are component-based methods for exploratory analyses of data sets structured in blocks of variables. Rather than operating sequentially on parts of the measurements, the RGCCA/SGCCA-based integrative analysis method aims at summarizing the relevant information between and within the blocks. It processes a priori information defining which blocks are supposed to be linked to one another, thus reflecting hypotheses about the biology underlying the data blocks. It also requires the setting of extra parameters that need to be carefully adjusted.Here, we provide practical guidelines for the use of RGCCA/SGCCA. We also illustrate the flexibility and usefulness of RGCCA/SGCCA on a unique cohort of patients with four genetic subtypes of spinocerebellar ataxia, in which we obtained multiple data sets from brain volumetry and magnetic resonance spectroscopy, and metabolomic and lipidomic analyses. As a first step toward the extraction of multimodal biomarkers, and through the reduction to a few meaningful components and the visualization of relevant variables, we identified possible markers of disease progression.
Several blood-based age prediction models have been developed using less than a dozen to more than a hundred DNA methylation biomarkers. Only one model (Z-P1) based on pyrosequencing has been developed using DNA methylation of a single locus located in the ELOVL2 promoter, which is considered as one of the best age-prediction biomarker. Although multi-locus models generally present better performances compared to the single-locus model, they require more DNA and present more inter-laboratory variations impacting the predictions. Here we developed 17,018 single-locus age prediction models based on DNA methylation of the ELOVL2 promoter from pooled data of four different studies (training set of 1,028 individuals aged from 0 and 91 years) using six different statistical approaches and testing every combination of the 7 CpGs, aiming to improve the prediction performances and reduce the effects of inter-laboratory variations. Compared to Z-P1 model, three statistical models with the optimal combinations of CpGs presented improved performances (MAD of 4.41–4.77 in the testing set of 385 individuals) and no age-dependent bias. In an independent testing set of 100 individuals (19–65 years), we showed that the prediction accuracy could be further improved by using different CpG combinations and increasing the number of technical replicates (MAD of 4.17).
Positron emission tomography (PET) is a molecular medical imaging modality which is commonly used for neurodegenerative diseases diagnosis. Computer-aided diagnosis, based on medical image analysis, could help quantitative evaluation of brain diseases such as Alzheimer’s disease (AD). A novel method of ranking the effectiveness of brain volume of interest (VOI) to separate healthy control from AD brains PET images is presented in this paper. Brain images are first mapped into anatomical VOIs using an atlas. Histogram-based features are then extracted and used to select and rank VOIs according to the area under curve (AUC) parameter, which produces a hierarchy of the ability of VOIs to separate between groups of subjects. The top-ranked VOIs are then input into a support vector machine classifier. The developed method is evaluated on a local database image and compared to the known selection feature methods. Results show that using AUC outperforms classification results in the case of a two group separation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.