When training predictive models from neuroimaging data, we typically have available non-imaging variables such as age and gender that affect the imaging data but which we may be uninterested in from a clinical perspective. Such variables are commonly referred to as ‘confounds’. In this work, we firstly give a working definition for confound in the context of training predictive models from samples of neuroimaging data. We define a confound as a variable which affects the imaging data and has an association with the target variable in the sample that differs from that in the population-of-interest, i.e., the population over which we intend to apply the estimated predictive model. The focus of this paper is the scenario in which the confound and target variable are independent in the population-of-interest, but the training sample is biased due to a sample association between the target and confound. We then discuss standard approaches for dealing with confounds in predictive modelling such as image adjustment and including the confound as a predictor, before deriving and motivating an Instance Weighting scheme that attempts to account for confounds by focusing model training so that it is optimal for the population-of-interest. We evaluate the standard approaches and Instance Weighting in two regression problems with neuroimaging data in which we train models in the presence of confounding, and predict samples that are representative of the population-of-interest. For comparison, these models are also evaluated when there is no confounding present. In the first experiment we predict the MMSE score using structural MRI from the ADNI database with gender as the confound, while in the second we predict age using structural MRI from the IXI database with acquisition site as the confound. Considered over both datasets we find that none of the methods for dealing with confounding gives more accurate predictions than a baseline model which ignores confounding, although including the confound as a predictor gives models that are less accurate than the baseline model. We do find, however, that different methods appear to focus their predictions on specific subsets of the population-of-interest, and that predictive accuracy is greater when there is no confounding present. We conclude with a discussion comparing the advantages and disadvantages of each approach, and the implications of our evaluation for building predictive models that can be used in clinical practice.
Canonical correlation analysis is a family of multivariate statistical methods for the analysis of paired sets of variables. Since its proposition, canonical correlation analysis has for instance been extended to extract relations between two sets of variables when the sample size is insufficient in relation to the data dimensionality, when the relations have been considered to be non-linear, and when the dimensionality is too large for human interpretation. This tutorial explains the theory of canonical correlation analysis including its regularised, kernel, and sparse variants. Additionally, the deep and Bayesian CCA extensions are briefly reviewed. Together with the numerical examples, this overview provides a coherent compendium on the applicability of the variants of canonical correlation analysis. By bringing together techniques for solving the optimisation problems, evaluating the statistical significance and generalisability of the canonical correlation model, and interpreting the relations, we hope that this article can serve as a hands-on tool for applying canonical correlation methods in data analysis.
Understanding how variations in dimensions of psychometrics, IQ and demographics relate to changes in brain connectivity during the critical developmental period of adolescence and early adulthood is a major challenge. This has particular relevance for mental health disorders where a failure to understand these links might hinder the development of better diagnostic approaches and therapeutics. Here, we investigated this question in 306 adolescents and young adults (14–24 y, 25 clinically depressed) using a multivariate statistical framework, based on canonical correlation analysis (CCA). By linking individual functional brain connectivity profiles to self-report questionnaires, IQ and demographic data we identified two distinct modes of covariation. The first mode mapped onto an externalization/internalization axis and showed a strong association with sex. The second mode mapped onto a well-being/distress axis independent of sex. Interestingly, both modes showed an association with age. Crucially, the changes in functional brain connectivity associated with changes in these phenotypes showed marked developmental effects. The findings point to a role for the default mode, frontoparietal and limbic networks in psychopathology and depression.
In neuroimaging-based diagnostic problems, the combination of different sources of information as MR images and clinical data is a challenging task. Their simple combination usually does not provides an improvement if compared with using the best source alone. In this paper, we deal with the well known Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset tackling the AD versus Control task. We use a recently proposed multiple kernel learning approach, called EasyMKL, to combine a huge amount of basic kernels in synergy with a feature selection methodology, pursuing an optimal and sparse solution to facilitate interpretability. Our new approach, called EasyMKLFS, outperforms baselines (e.g. SVM) and state-of-the-art methods as recursive feature elimination and SimpleMKL.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.