The advent of high throughput technologies has led to a wealth of publicly available ‘omics data coming from different sources, such as transcriptomics, proteomics, metabolomics. Combining such large-scale biological data sets can lead to the discovery of important biological insights, provided that relevant information can be extracted in a holistic manner. Current statistical approaches have been focusing on identifying small subsets of molecules (a ‘molecular signature’) to explain or predict biological conditions, but mainly for a single type of ‘omics. In addition, commonly used methods are univariate and consider each biological feature independently. We introduce , an R package dedicated to the multivariate analysis of biological data sets with a specific focus on data exploration, dimension reduction and visualisation. By adopting a systems biology approach, the toolkit provides a wide range of methods that statistically integrate several data sets at once to probe relationships between heterogeneous ‘omics data sets. Our recent methods extend Projection to Latent Structure (PLS) models for discriminant analysis, for data integration across multiple ‘omics data or across independent studies, and for the identification of molecular signatures. We illustrate our latest integrative frameworks for the multivariate analyses of ‘omics data available from the package.
Motivation In the continuously expanding omics era, novel computational and statistical strategies are needed for data integration and identification of biomarkers and molecular signatures. We present Data Integration Analysis for Biomarker discovery using Latent cOmponents (DIABLO), a multi-omics integrative method that seeks for common information across different data types through the selection of a subset of molecular features, while discriminating between multiple phenotypic groups. Results Using simulations and benchmark multi-omics studies, we show that DIABLO identifies features with superior biological relevance compared with existing unsupervised integrative methods, while achieving predictive performance comparable to state-of-the-art supervised approaches. DIABLO is versatile, allowing for modular-based analyses and cross-over study designs. In two case studies, DIABLO identified both known and novel multi-omics biomarkers consisting of mRNAs, miRNAs, CpGs, proteins and metabolites. Availability and implementation DIABLO is implemented in the mixOmics R Bioconductor package with functions for parameters’ choice and visualization to assist in the interpretation of the integrative analyses, along with tutorials on http://mixomics.org and in our Bioconductor vignette. Supplementary information Supplementary data are available at Bioinformatics online.
Hepatocellular carcinomas (HCCs) exhibit a diversity of molecular phenotypes, raising major challenges in clinical management. HCCs detected by surveillance programs at an early stage are candidates for potentially curative therapies (local ablation, resection, or transplantation). In the long term, transplantation provides the lowest recurrence rates. Treatment allocation is based on tumor number, size, vascular invasion, performance status, functional liver reserve, and the prediction of early (<2 years) recurrence, which reflects the intrinsic aggressiveness of the tumor. Well‐differentiated, potentially low‐aggressiveness tumors form the heterogeneous molecular class of nonproliferative HCCs, characterized by an approximate 50% β‐catenin mutation rate. To define the clinical, pathological, and molecular features and the outcome of nonproliferative HCCs, we constructed a 1,133‐HCC transcriptomic metadata set and validated findings in a publically available 210‐HCC RNA sequencing set. We show that nonproliferative HCCs preserve the zonation program that distributes metabolic functions along the portocentral axis in normal liver. More precisely, we identified two well‐differentiated, nonproliferation subclasses, namely periportal‐type (wild‐type β‐catenin) and perivenous‐type (mutant β‐catenin), which expressed negatively correlated gene networks. The new periportal‐type subclass represented 29% of all HCCs; expressed a hepatocyte nuclear factor 4A–driven gene network, which was down‐regulated in mouse hepatocyte nuclear factor 4A knockout mice; were early‐stage tumors by Barcelona Clinic Liver Cancer, Cancer of the Liver Italian Program, and tumor–node–metastasis staging systems; had no macrovascular invasion; and showed the lowest metastasis‐specific gene expression levels and TP53 mutation rates. Also, we identified an eight‐gene periportal‐type HCC signature, which was independently associated with the highest 2‐year recurrence‐free survival by multivariate analyses in two independent cohorts of 247 and 210 patients. Conclusion: Well‐differentiated HCCs display mutually exclusive periportal or perivenous zonation programs. Among all HCCs, periportal‐type tumors have the lowest intrinsic potential for early recurrence after curative resection. (Hepatology 2017;66:1502–1518).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.