Signalling pathway activation analysis is a powerful approach for extracting biologically relevant features from large-scale transcriptomic and proteomic data. However, modern pathway-based methods often fail to provide stable pathway signatures of a specific phenotype or reliable disease biomarkers. In the present study, we introduce the in silico Pathway Activation Network Decomposition Analysis (iPANDA) as a scalable robust method for biomarker identification using gene expression data. The iPANDA method combines precalculated gene coexpression data with gene importance factors based on the degree of differential gene expression and pathway topology decomposition for obtaining pathway activation scores. Using Microarray Analysis Quality Control (MAQC) data sets and pretreatment data on Taxol-based neoadjuvant breast cancer therapy from multiple sources, we demonstrate that iPANDA provides significant noise reduction in transcriptomic data and identifies highly robust sets of biologically relevant pathway signatures. We successfully apply iPANDA for stratifying breast cancer patients according to their sensitivity to neoadjuvant therapy.
The overall impact of proteomics on clinical research and its translation has lagged behind expectations. One recognized caveat is the limited size (subject numbers) of (pre)clinical studies performed at the discovery stage, the findings of which fail to be replicated in larger verification/validation trials. Compromised study designs and insufficient statistical power are consequences of the to-date still limited capacity of mass spectrometry (MS)-based workflows to handle large numbers of samples in a realistic time frame, while delivering comprehensive proteome coverages. We developed a highly automated proteomic biomarker discovery workflow. Herein, we have applied this approach to analyze 1000 plasma samples from the multicentered human dietary intervention study "DiOGenes". Study design, sample randomization, tracking, and logistics were the foundations of our large-scale study. We checked the quality of the MS data and provided descriptive statistics. The data set was interrogated for proteins with most stable expression levels in that set of plasma samples. We evaluated standard clinical variables that typically impact forthcoming results and assessed body mass index-associated and gender-specific proteins at two time points. We demonstrate that analyzing a large number of human plasma samples for biomarker discovery with MS using isobaric tagging is feasible, providing robust and consistent biological results.
Holistic human proteome maps are expected to complement comprehensive profile assessment of health and disease phenotypes. However, methodologies to analyze proteomes in human tissue or body fluid samples at relevant scale and performance are still limited in clinical research. Their deployment and demonstration in large enough human populations are even sparser. In the present study, we have characterized and compared the plasma proteomes of two large independent cohorts of obese and overweight individuals using shotgun mass spectrometry (MS)-based proteomics. Herein, we showed, in both populations from different continents of about 500 individuals each, the concordance of plasma protein MS measurements in terms of variability, gender-specificity, and age-relationship. Additionally, we replicated several known and new associations between proteins, clinical and molecular variables, such as insulin and glucose concentrations. In conclusion, our MS-based analyses of plasma samples from independent human cohorts proved the practical feasibility and efficiency of a large and unified discovery/replication approach in proteomics, which was also recently coined “rectangular” design.
SUMMARY Combined inference for heterogeneous high-dimensional data is critical in modern biology, where clinical and various kinds of molecular data may be available from a single study. Classical genetic association studies regress a single clinical outcome on many genetic variants one by one, but there is an increasing demand for joint analysis of many molecular outcomes and genetic variants in order to unravel functional interactions. Unfortunately, most existing approaches to joint modeling are either too simplistic to be powerful or are impracticable for computational reasons. Inspired by Richardson and others (2010, Bayesian Statistics 9), we consider a sparse multivariate regression model that allows simultaneous selection of predictors and associated responses. As Markov chain Monte Carlo (MCMC) inference on such models can be prohibitively slow when the number of genetic variants exceeds a few thousand, we propose a variational inference approach which produces posterior information very close to that of MCMC inference, at a much reduced computational cost. Extensive numerical experiments show that our approach outperforms popular variable selection methods and tailored Bayesian procedures, dealing within hours with problems involving hundreds of thousands of genetic variants and tens to hundreds of clinical or molecular outcomes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.