Background Untargeted metabolomics datasets contain large proportions of uninformative features that can impede subsequent statistical analysis such as biomarker discovery and metabolic pathway analysis. Thus, there is a need for versatile and data-adaptive methods for filtering data prior to investigating the underlying biological phenomena. Here, we propose a data-adaptive pipeline for filtering metabolomics data that are generated by liquid chromatography-mass spectrometry (LC-MS) platforms. Our data-adaptive pipeline includes novel methods for filtering features based on blank samples, proportions of missing values, and estimated intra-class correlation coefficients. Results Using metabolomics datasets that were generated in our laboratory from samples of human blood, as well as two public LC-MS datasets, we compared our data-adaptive filtering method with traditional methods that rely on non-method specific thresholds. The data-adaptive approach outperformed traditional approaches in terms of removing noisy features and retaining high quality, biologically informative ones. The R code for running the data-adaptive filtering method is provided at https://github.com/courtneyschiffman/Metabolomics-Filtering . Conclusions Our proposed data-adaptive filtering pipeline is intuitive and effectively removes uninformative features from untargeted metabolomics datasets. It is particularly relevant for interrogation of biological phenomena in data derived from complex matrices associated with biospecimens. Electronic supplementary material The online version of this article (10.1186/s12859-019-2871-9) contains supplementary material, which is available to authorized users.
Introduction For pediatric diseases like childhood leukemia, a short latency period points to in-utero exposures as potentially important risk factors. Untargeted metabolomics of small molecules in archived newborn dried blood spots (DBS) offers an avenue for discovering early-life exposures that contribute to disease risks. Objectives The purpose of this study was to develop a quantitative method for untargeted analysis of archived newborn DBS for use in an epidemiological study (California Childhood Leukemia Study, CCLS). Methods Using experimental DBS from the blood of an adult volunteer, we optimized extraction of small molecules and integrated measurement of potassium as a proxy for blood hematocrit. We then applied this extraction method to 4.7-mm punches from 106 control DBS samples from the CCLS. Sample extracts were analyzed with liquid chromatography high resolution mass spectrometry (LC-HRMS) and an untargeted workflow was used to screen for metabolites that discriminate population characteristics such as sex, ethnicity, and birth weight. Results Thousands of small molecules were measured in extracts of archived DBS. Normalizing for potassium levels removed variability related to varying hematocrit across DBS punches. Of the roughly 1,000 prevalent small molecules that were tested, multivariate linear regression detected significant associations with ethnicity (3 metabolites) and birth weight (15 metabolites) after adjusting for multiple testing. Conclusions This untargeted workflow can be used for analysis of small molecules in archived DBS to discover novel biomarkers, to provide insights into the initiation and progression of diseases, and to provide guidance for disease prevention.
Background: Cross-sectional studies reported a novel set of hydroxylated ultra-long-chain fatty acids (ULCFA) that were present at significantly lower levels in colorectal cancer cases than controls. Follow-up studies suggested that these molecules were potential biomarkers of protective exposure for colorectal cancer. To test the hypothesis that ULCFAs reflect causal pathways, we measured their levels in prediagnostic serum from incident colorectal cancer cases and controls.Methods: Serum from 95 colorectal cancer patients and 95 matched controls was obtained from the Italian arm of the European Prospective Investigation into Cancer and Nutrition cohort and analyzed by liquid chromatography-high-resolution mass spectrometry. Levels of 8 ULCFAs were compared between cases and controls with paired t tests and a linear model that used time to diagnosis (TTD) to determine whether case-control differences were influenced by disease progression.
Early-life exposures are believed to influence the incidence of pediatric acute lymphoblastic leukemia (ALL). Archived neonatal blood spots (NBS), collected within the first days of life, offer a means to investigate small molecules that reflect early-life exposures. Using untargeted metabolomics, we compared abundances of small-molecule features in extracts of NBS punches from 332 children that later developed ALL and 324 healthy controls. Subjects were stratified by early (1-5 y) and late (6-14 y) diagnosis. Mutually-exclusive sets of metabolic featuresrepresenting putative lipids and fatty acids-were associated with ALL, including 9 and 19 metabolites in the early-and late-diagnosis groups, respectively. In the late-diagnosis group, a prominent cluster of features with apparent 18:2 fatty-acid chains suggested that newborn exposure to the essential nutrient, linoleic acid, increased ALL risk. Interestingly, abundances of these putative 18:2 lipids were greater in infants who were fed formula rather than breast milk (colostrum) and increased with the mother's pre-pregnancy body mass index. These results suggest possible etiologic roles of newborn nutrition in late-diagnosis ALL.
BackgroundEpidemiologists are beginning to employ metabolomics and lipidomics with archived blood from incident cases and controls to discover causes of cancer. Although several such studies have focused on colorectal cancer (CRC), they all followed targeted or semi-targeted designs that limited their ability to find discriminating molecules and pathways related to the causes of CRC.MethodsUsing an untargeted design, we measured lipophilic metabolites in prediagnostic serum from 66 CRC patients and 66 matched controls from the European Prospective Investigation into Cancer and Nutrition (Turin, Italy). Samples were analyzed by liquid chromatography-high-resolution mass spectrometry (LC-MS), resulting in 8690 features for statistical analysis.ResultsRather than the usual multiple-hypothesis-testing approach, we based variable selection on an ensemble of regression methods, which found nine features to be associated with case-control status. We then regressed each selected feature on time-to-diagnosis to determine whether the feature was likely to be either a potentially causal biomarker or a reactive product of disease progression (reverse causality).ConclusionsOf the nine selected LC-MS features, four appear to be involved in CRC etiology and merit further investigation in prospective studies of CRC. Four other features appear to be related to progression of the disease (reverse causality), and may represent biomarkers of value for early detection of CRC.Electronic supplementary materialThe online version of this article (10.1186/s12885-018-4894-4) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.