Analytical errors caused by suboptimal performance of the chosen platform for a number of metabolites and instrumental drift are a major issue in large-scale metabolomics studies. Especially for MS-based methods, which are gaining common ground within metabolomics, it is difficult to control the analytical data quality without the availability of suitable labeled internal standards and calibration standards even within one laboratory. In this paper, we suggest a workflow for significant reduction of the analytical error using pooled calibration samples and multiple internal standard strategy. Between and within batch calibration techniques are applied and the analytical error is reduced significantly (increase of 25% of peaks with RSD lower than 20%) and does not hamper or interfere with statistical analysis of the final data.
Middle-aged offspring of nonagenarians, as compared to their spouses (controls), show a favorable lipid metabolism marked by larger LDL particle size in men and lower total triglyceride levels in women. To investigate which specific lipids associate with familial longevity, we explore the plasma lipidome by measuring 128 lipid species using liquid chromatography coupled to mass spectrometry in 1526 offspring of nonagenarians (59 years ± 6.6) and 675 (59 years ± 7.4) controls from the Leiden Longevity Study. In men, no significant differences were observed between offspring and controls. In women, however, 19 lipid species associated with familial longevity. Female offspring showed higher levels of ether phosphocholine (PC) and sphingomyelin (SM) species (3.5–8.7%) and lower levels of phosphoethanolamine PE (38:6) and long-chain triglycerides (TG) (9.4–12.4%). The association with familial longevity of two ether PC and four SM species was independent of total triglyceride levels. In addition, the longevity-associated lipid profile was characterized by a higher ratio of monounsaturated (MUFA) over polyunsaturated (PUFA) lipid species, suggesting that female offspring have a plasma lipidome less prone to oxidative stress. Ether PC and SM species were identified as novel longevity markers in females, independent of total triglycerides levels. Several longevity-associated lipids correlated with a lower risk of hypertension and diabetes in the Leiden Longevity Study cohort. This sex-specific lipid signature marks familial longevity and may suggest a plasma lipidome with a better antioxidant capacity, lower lipid peroxidation and inflammatory precursors, and an efficient beta-oxidation function.
Diabetic kidney disease (DKD) is a devastating complication that affects an estimated third of patients with type 1 diabetes mellitus (DM). There is no cure once the disease is diagnosed, but early treatment at a sub-clinical stage can prevent or at least halt the progression. DKD is clinically diagnosed as abnormally high urinary albumin excretion rate (AER). We hypothesize that subtle changes in the urine metabolome precede the clinically significant rise in AER. To test this, 52 type 1 diabetic patients were recruited by the FinnDiane study that had normal AER (normoalbuminuric). After an average of 5.5 years of follow-up half of the subjects (26) progressed from normal AER to microalbuminuria or DKD (macroalbuminuria), the other half remained normoalbuminuric. The objective of this study is to discover urinary biomarkers that differentiate the progressive form of albuminuria from non-progressive form of albuminuria in humans. Metabolite profiles of baseline 24 h urine samples were obtained by gas chromatography–mass spectrometry (GC–MS) and liquid chromatography–mass spectrometry (LC–MS) to detect potential early indicators of pathological changes. Multivariate logistic regression modeling of the metabolomics data resulted in a profile of metabolites that separated those patients that progressed from normoalbuminuric AER to microalbuminuric AER from those patients that maintained normoalbuminuric AER with an accuracy of 75% and a precision of 73%. As this data and samples are from an actual patient population and as such, gathered within a less controlled environment it is striking to see that within this profile a number of metabolites (identified as early indicators) have been associated with DKD already in literature, but also that new candidate biomarkers were found. The discriminating metabolites included acyl-carnitines, acyl-glycines and metabolites related to tryptophan metabolism. We found candidate biomarkers that were univariately significant different. This study demonstrates the potential of multivariate data analysis and metabolomics in the field of diabetic complications, and suggests several metabolic pathways relevant for further biological studies.Electronic supplementary materialThe online version of this article (doi:10.1007/s11306-011-0291-6) contains supplementary material, which is available to authorized users.
Due to the complexity of typical metabolomics samples and the many steps required to obtain quantitative data in GC × GC–MS consisting of deconvolution, peak picking, peak merging, and integration, the unbiased non-target quantification of GC × GC–MS data still poses a major challenge in metabolomics analysis. The feasibility of using commercially available software for non-target processing of GC × GC–MS data was assessed. For this purpose a set of mouse liver samples (24 study samples and five quality control (QC) samples prepared from the study samples) were measured with GC × GC–MS and GC–MS to study the development and progression of insulin resistance, a primary characteristic of diabetes type 2. A total of 170 and 691 peaks were quantified in, respectively, the GC–MS and GC × GC–MS data for all study and QC samples. The quantitative results for the QC samples were compared to assess the quality of semi-automated GC × GC–MS processing compared to targeted GC–MS processing which involved time-consuming manual correction of all wrongly integrated metabolites and was considered as golden standard. The relative standard deviations (RSDs) obtained with GC × GC–MS were somewhat higher than with GC–MS, due to less accurate processing. Still, the biological information in the study samples was preserved and the added value of GC × GC–MS was demonstrated; many additional candidate biomarkers were found with GC × GC–MS compared to GC–MS.Electronic supplementary materialThe online version of this article (doi:10.1007/s11306-010-0219-6) contains supplementary material, which is available to authorized users.
Multi-omics approaches use a diversity of high-throughput technologies to profile the different molecular layers of living cells. Ideally, the integration of this information should result in comprehensive systems models of cellular physiology and regulation. However, most multi-omics projects still include a limited number of molecular assays and there have been very few multi-omic studies that evaluate dynamic processes such as cellular growth, development and adaptation. Hence, we lack formal analysis methods and comprehensive multi-omics datasets that can be leveraged to develop true multi-layered models for dynamic cellular systems. Here we present the STATegra multi-omics dataset that combines measurements from up to 10 different omics technologies applied to the same biological system, namely the well-studied mouse pre-B-cell differentiation. STATegra includes high-throughput measurements of chromatin structure, gene expression, proteomics and metabolomics, and it is complemented with single-cell data. To our knowledge, the STATegra collection is the most diverse multi-omics dataset describing a dynamic biological system.
Combination of data sets from different objects (for example, from two groups of healthy volunteers from the same population) that were measured on a common set of variables (for example, metabolites or peptides) is desirable for statistical analysis in "omics" studies because it increases power. However, this type of combination is not directly possible if nonbiological systematic differences exist among the individual data sets, or "blocks". Such differences can, for example, be due to small analytical changes that are likely to accumulate over large time intervals between blocks of measurements. In this article we present a data transformation method, that we will refer to as "quantile equating", which per variable corrects for linear and nonlinear differences in distribution among blocks of semiquantitative data obtained with the same analytical method. We demonstrate the successful application of the quantile equating method to data obtained on two typical metabolomics platforms, i.e., liquid chromatography-mass spectrometry and nuclear magnetic resonance spectroscopy. We suggest uni- and multivariate methods to evaluate similarities and differences among data blocks before and after quantile equating. In conclusion, we have developed a method to correct for nonbiological systematic differences among semiquantitative data blocks and have demonstrated its successful application to metabolomics data sets.
In many areas of science, multiple sets of data are collected from the samples.Such data sets can be analyzed by multiblock (or data fusion) methods. The aim is usually to get a holistic understanding of the system or better prediction of some response. Lately, several scientific groups have developed methods for separating common and distinct variation between multiple data blocks.Although the objective is the same, the strategies and algorithms are completely different for these methods. In this paper, we investigate the practical properties of the four most popular methods for separating common and distinct variation: JIVE, DISCO, PCA-GCA, and OnPLS. The main barrier complicating the use of any of these methods is model selection and validation.Especially when the numbers of blocks is more than two. By the use of extensive simulations, we have elucidated the three properties that are important for assessing the validity of the results: The ability to identify the correct model, the ability to estimate the true, underlying subspaces, and the robustness towards misspecification of the model.The simulated data sets mimic a range of "real life" data, with different dimensionalities and variance structures. We are thus able to identify which methods work best for different types of data structures, and pinpoint weak spots for each method. The results show that PCA-GCA works best for model selection, while JIVE and DISCO give the best estimates of the subspaces and are most robust towards model misspecification.
BackgroundJoint and individual variation explained (JIVE), distinct and common simultaneous component analysis (DISCO) and O2-PLS, a two-block (X-Y) latent variable regression method with an integral OSC filter can all be used for the integrated analysis of multiple data sets and decompose them in three terms: a low(er)-rank approximation capturing common variation across data sets, low(er)-rank approximations for structured variation distinctive for each data set, and residual noise. In this paper these three methods are compared with respect to their mathematical properties and their respective ways of defining common and distinctive variation.ResultsThe methods are all applied on simulated data and mRNA and miRNA data-sets from GlioBlastoma Multiform (GBM) brain tumors to examine their overlap and differences. When the common variation is abundant, all methods are able to find the correct solution. With real data however, complexities in the data are treated differently by the three methods.ConclusionsAll three methods have their own approach to estimate common and distinctive variation with their specific strength and weaknesses. Due to their orthogonality properties and their used algorithms their view on the data is slightly different. By assuming orthogonality between common and distinctive, true natural or biological phenomena that may not be orthogonal at all might be misinterpreted.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1037-2) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.