Associations between high-dimensional datasets, each comprising many features, can be discovered through multivariate statistical methods, like Canonical Correlation Analysis (CCA) or Partial Least Squares (PLS). CCA and PLS are widely used methods which reveal which features carry the association. Despite the longevity and popularity of CCA/PLS approaches, their application to high-dimensional datasets raises critical questions about the reliability of CCA/PLS solutions. In particular, overfitting can produce solutions that are not stable across datasets, which severely hinders their interpretability and generalizability. To study these issues, we developed a generative model to simulate synthetic datasets with multivariate associations, parameterized by feature dimensionality, data variance structure, and assumed latent association strength. We found that resulting CCA/PLS associations could be highly inaccurate when the number of samples per feature is relatively small. For PLS, the profiles of feature weights exhibit detrimental bias toward leading principal component axes. We confirmed these model trends in state-ofthe-art datasets containing neuroimaging and behavioral measurements in large numbers of subjects, namely the Human Connectome Project (n ≈ 1000) and UK Biobank (n = 20000), where we found that only the latter comprised enough samples to obtain stable estimates. Analysis of the neuroimaging literature using CCA to map brain-behavior relationships revealed that the commonly employed sample sizes yield unstable CCA solutions. Our generative modeling framework provides a calculator of dataset properties required for stable estimates. Collectively, our study characterizes dataset properties needed to limit the potentially detrimental effects of overfitting on stability of CCA/PLS solutions, and provides practical recommendations for future studies.Significance StatementScientific studies often begin with an observed association between different types of measures. When datasets comprise large numbers of features, multivariate approaches such as canonical correlation analysis (CCA) and partial least squares (PLS) are often used. These methods can reveal the profiles of features that carry the optimal association. We developed a generative model to simulate data, and characterized how obtained feature profiles can be unstable, which hinders interpretability and generalizability, unless a sufficient number of samples is available to estimate them. We determine sufficient sample sizes, depending on properties of datasets. We also show that these issues arise in neuroimaging studies of brain-behavior relationships. We provide practical guidelines and computational tools for future CCA and PLS studies.
White matter bundle segmentation using diffusion MRI fiber tractography has become the method of choice to identify white matter fiber pathways in vivo in human brains. However, like other analyses of complex data, there is considerable variability in segmentation protocols and techniques. This can result in different reconstructions of the same intended white matter pathways, which directly affects tractography results, quantification, and interpretation. In this study, we aim to evaluate and quantify the variability that arises from different protocols for bundle segmentation. Through an open call to users of fiber tractography, including anatomists, clinicians, and algorithm developers, 42 independent teams were given processed sets of human wholebrain streamlines and asked to segment 14 white matter fascicles on six subjects. In total, we received 57 different bundle segmentation protocols, which enabled detailed volume-based and streamline-based analyses of agreement and disagreement among protocols for each fiber pathway. Results show that even when given the exact same sets of underlying streamlines, the variability across protocols for bundle segmentation is greater than all other sources of variability in the virtual dissection process, including variability within protocols and variability across subjects. In order to foster the use of tractography bundle dissection in routine clinical settings, and as a fundamental analytical tool, future endeavors must aim to resolve and reduce this heterogeneity. Although external validation is needed to verify the anatomical accuracy of bundle dissections, reducing heterogeneity is a step towards reproducible research and may be achieved through the use of standard nomenclature and definitions of white matter bundles and well-chosen constraints and decisions in the dissection process.
We present a new toolbox and library of standardised tractography protocols devised for the robust automated extraction of white matter tracts both in the human and the macaque brain. Using in vivo data from the Human Connectome Project (HCP) and the UK Biobank and ex vivo data for the macaque brain datasets, we obtain white matter atlases, as well as atlases for tract endpoints on the white-grey matter boundary, for both species. We illustrate that our protocols are robust against data quality, generalisable across two species and reflect the known anatomy. We further demonstrate that they capture inter-subject variability by preserving tract lateralisation in humans and tract similarities stemming from twinship in the HCP cohort. Our results demonstrate that the presented toolbox will be useful for generating imaging-derived features in large cohorts, and in facilitating comparative neuroanatomy studies. The software, tractography protocols, and atlases are publicly released through FSL, allowing users to define their own tractography protocols in a standardised manner, further contributing to open science.
How temporal modulations in functional interactions are shaped by the underlying anatomical connections remains an open question. Here, we analyse the role of structural eigenmodes, in the formation and dissolution of temporally evolving functional brain networks using resting‐state magnetoencephalography and diffusion magnetic resonance imaging data at the individual subject level. Our results show that even at short timescales, phase and amplitude connectivity can partly be expressed by structural eigenmodes, but hardly by direct structural connections. Albeit a stronger relationship was found between structural eigenmodes and time‐resolved amplitude connectivity. Time‐resolved connectivity for both phase and amplitude was mostly characterised by a stationary process, superimposed with very brief periods that showed deviations from this stationary process. For these brief periods, dynamic network states were extracted that showed different expressions of eigenmodes. Furthermore, the eigenmode expression was related to overall cognitive performance and co‐occurred with fluctuations in community structure of functional networks. These results implicate that ongoing time‐resolved resting‐state networks, even at short timescales, can to some extent be understood in terms of activation and deactivation of structural eigenmodes and that these eigenmodes play a role in the dynamic integration and segregation of information across the cortex, subserving cognitive functions.
Neuroimaging technology has experienced explosive growth and has transformed the study of neural mechanisms across health and disease. However, given the diversity of sophisticated tools for handling neuroimaging data, the field faces challenges around method integration (1-3). Specifically, researchers often have to rely on siloed approaches which limit reproducibility, with idiosyncratic data organization and limited software interoperability. To address these challenges, we developed Quantitative Neuroimaging Environment & Toolbox (QuNex), a platform for consistent end-to-end processing and analytics. QuNex is engineered for reproducible deployment of custom workflows, from onboarding raw data to generating analytic features, in a single "turnkey" command. The platform enables inter-operable integration of multi-modal, community-developed neuroimaging software through an extension framework with a software development kit for seamless integration of community tools. Critically, it supports high-throughput, parallel processing in high-performance compute environments, either locally or in the cloud. Notably, QuNex has successfully processed over 10,000 scans across neuroimaging consortia (4), including multiple clinical datasets. Moreover, QuNex enables integration of non-human primate, rodent, and human workflows via a cohesive translational platform. Collectively, this effort stands to significantly impact neuroimaging method integration across acquisition approaches, pipelines, datasets, computational environments, and species. Building on this platform will enable more rapid, scalable, and reproducible impact of neuroimaging technology across health and disease.
IntroductionNeuroimaging technology has experienced explosive growth and transformed the study of neural mechanisms across health and disease. However, given the diversity of sophisticated tools for handling neuroimaging data, the field faces challenges in method integration, particularly across multiple modalities and species. Specifically, researchers often have to rely on siloed approaches which limit reproducibility, with idiosyncratic data organization and limited software interoperability.MethodsTo address these challenges, we have developed Quantitative Neuroimaging Environment & Toolbox (QuNex), a platform for consistent end-to-end processing and analytics. QuNex provides several novel functionalities for neuroimaging analyses, including a “turnkey” command for the reproducible deployment of custom workflows, from onboarding raw data to generating analytic features.ResultsThe platform enables interoperable integration of multi-modal, community-developed neuroimaging software through an extension framework with a software development kit (SDK) for seamless integration of community tools. Critically, it supports high-throughput, parallel processing in high-performance compute environments, either locally or in the cloud. Notably, QuNex has successfully processed over 10,000 scans across neuroimaging consortia, including multiple clinical datasets. Moreover, QuNex enables integration of human and non-human workflows via a cohesive translational platform.DiscussionCollectively, this effort stands to significantly impact neuroimaging method integration across acquisition approaches, pipelines, datasets, computational environments, and species. Building on this platform will enable more rapid, scalable, and reproducible impact of neuroimaging technology across health and disease.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.