Motivation Selecting the optimal machine learning (ML) model for a given dataset is often challenging. Automated ML (AutoML) has emerged as a powerful tool for enabling the automatic selection of ML methods and parameter settings for the prediction of biomedical endpoints. Here, we apply the tree-based pipeline optimization tool (TPOT) to predict angiographic diagnoses of coronary artery disease (CAD). With TPOT, ML models are represented as expression trees and optimal pipelines discovered using a stochastic search method called genetic programing. We provide some guidelines for TPOT-based ML pipeline selection and optimization-based on various clinical phenotypes and high-throughput metabolic profiles in the Angiography and Genes Study (ANGES). Results We analyzed nuclear magnetic resonance-derived lipoprotein and metabolite profiles in the ANGES cohort with a goal to identify the role of non-obstructive CAD patients in CAD diagnostics. We performed a comparative analysis of TPOT-generated ML pipelines with selected ML classifiers, optimized with a grid search approach, applied to two phenotypic CAD profiles. As a result, TPOT-generated ML pipelines that outperformed grid search optimized models across multiple performance metrics including balanced accuracy and area under the precision-recall curve. With the selected models, we demonstrated that the phenotypic profile that distinguishes non-obstructive CAD patients from no CAD patients is associated with higher precision, suggesting a discrepancy in the underlying processes between these phenotypes. Availability and implementation TPOT is freely available via http://epistasislab.github.io/tpot/. Contact jhmoore@upenn.edu Supplementary information Supplementary data are available at Bioinformatics online.
BackgroundWhile commonly assumed in the biochemistry community that the control of metabolic pathways is thought to be critical to cellular function, it is unclear if metabolic pathways generally have evolutionarily stable rate limiting (flux controlling) steps.ResultsA set of evolutionary simulations using a kinetic model of a metabolic pathway was performed under different conditions to evaluate the evolutionary stability of rate limiting steps. Simulations used combinations of selection for steady state flux, selection against the cost of molecular biosynthesis, and selection against the accumulation of high concentrations of a deleterious intermediate. Two mutational regimes were used, one with mutations that on average were neutral to molecular phenotype and a second with a preponderance of activity-destroying mutations. The evolutionary stability of rate limiting steps was low in all simulations with non-neutral mutational processes. Clustering of parameter co-evolution showed divergent inter-molecular evolutionary patterns under different evolutionary regimes.ConclusionsThis study provides a null model for pathway evolution when compensatory processes dominate with potential applications to predicting pathway functional change. This result also suggests a possible mechanism in which studies in statistical genetics that aim to associate a genotype to a phenotype assuming independent action of variants may be mis-specified through a mis-characterization of the link between individual gene function and pathway function. A better understanding of the genotype-phenotype map has potential applications in differentiating between compensatory changes and directional selection on pathways as well as detecting SNPs and fixed differences that might have phenotypic effects.ReviewersThis article was reviewed by Arne Elofsson, David Ardell, and Shamil Sunyaev.Electronic supplementary materialThe online version of this article (doi:10.1186/s13062-016-0133-6) contains supplementary material, which is available to authorized users.
With the maturation of metabolomics science and proliferation of biobanks, clinical metabolic profiling is an increasingly opportunistic frontier for advancing translational clinical research. Automated Machine Learning ( AutoML ) approaches provide exciting opportunity to guide feature selection in agnostic metabolic profiling endeavors, where potentially thousands of independent data points must be evaluated. In previous research, AutoML using high-dimensional data of varying types has been demonstrably robust, outperforming traditional approaches. However, considerations for application in clinical metabolic profiling remain to be evaluated. Particularly, regarding the robustness of AutoML to identify and adjust for common clinical confounders. In this study, we present a focused case study regarding AutoML considerations for using the Tree-Based Optimization Tool ( TPOT ) in metabolic profiling of exposure to metformin in a biobank cohort. First, we propose a tandem rank-accuracy measure to guide agnostic feature selection and corresponding threshold determination in clinical metabolic profiling endeavors. Second, while AutoML, using default parameters, demonstrated potential to lack sensitivity to low-effect confounding clinical covariates, we demonstrated residual training and adjustment of metabolite features as an easily applicable approach to ensure AutoML adjustment for potential confounding characteristics. Finally, we present increased homocysteine with long-term exposure to metformin as a potentially novel, non-replicated metabolite association suggested by TPOT; an association not identified in parallel clinical metabolic profiling endeavors. While warranting independent replication, our tandem rank-accuracy measure suggests homocysteine to be the metabolite feature with largest effect, and corresponding priority for further translational clinical research. Residual training and adjustment for a potential confounding effect by BMI only slightly modified the suggested association. Increased homocysteine is thought to be associated with vitamin B12 deficiency – evaluation for potential clinical relevance is suggested. While considerations for clinical metabolic profiling are recommended, including adjustment approaches for clinical confounders, AutoML presents an exciting tool to enhance clinical metabolic profiling and advance translational research endeavors.
Dental caries is characterized by a dysbiotic shift at the biofilm–tooth surface interface, yet comprehensive biochemical characterizations of the biofilm are scant. We used metabolomics to identify biochemical features of the supragingival biofilm associated with early childhood caries (ECC) prevalence and severity. The study’s analytical sample comprised 289 children ages 3 to 5 (51% with ECC) who attended public preschools in North Carolina and were enrolled in a community-based cross-sectional study of early childhood oral health. Clinical examinations were conducted by calibrated examiners in community locations using International Caries Detection and Classification System (ICDAS) criteria. Supragingival plaque collected from the facial/buccal surfaces of all primary teeth in the upper-left quadrant was analyzed using ultra-performance liquid chromatography–tandem mass spectrometry. Associations between individual metabolites and 18 clinical traits (based on different ECC definitions and sets of tooth surfaces) were quantified using Brownian distance correlations (dCor) and linear regression modeling of log2-transformed values, applying a false discovery rate multiple testing correction. A tree-based pipeline optimization tool (TPOT)–machine learning process was used to identify the best-fitting ECC classification metabolite model. There were 503 named metabolites identified, including microbial, host, and exogenous biochemicals. Most significant ECC-metabolite associations were positive (i.e., upregulations/enrichments). The localized ECC case definition (ICDAS ≥1 caries experience within the surfaces from which plaque was collected) had the strongest correlation with the metabolome (dCor P = 8 × 10−3). Sixteen metabolites were significantly associated with ECC after multiple testing correction, including fucose ( P = 3.0 × 10−6) and N-acetylneuraminate (p = 6.8 × 10−6) with higher ECC prevalence, as well as catechin ( P = 4.7 × 10−6) and epicatechin ( P = 2.9 × 10−6) with lower. Catechin, epicatechin, imidazole propionate, fucose, 9,10-DiHOME, and N-acetylneuraminate were among the top 15 metabolites in terms of ECC classification importance in the automated TPOT model. These supragingival biofilm metabolite findings provide novel insights in ECC biology and can serve as the basis for the development of measures of disease activity or risk assessment.
Chimeric antigen receptor (CAR) T-cells directed against CD19 have drastically altered outcomes for children with relapsed and refractory acute lymphoblastic leukemia (r/r ALL). Pediatric patients with r/r ALL treated with CAR-T are at increased risk of both cytokine release syndrome (CRS) and sepsis. We sought to investigate the biologic differences between CRS and sepsis and to develop predictive models which could accurately differentiate CRS from sepsis at the time of critical illness. We identified 23 different cytokines that were significantly different between patients with sepsis and CRS. Using elastic net prediction modeling and tree classification, we identified cytokines that were able to classify subjects as having CRS or sepsis accurately. A markedly elevated interferon γ (IFNγ) or a mildly elevated IFNγ in combination with a low IL1β were associated with CRS. A normal to mildly elevated IFNγ in combination with an elevated IL1β was associated with sepsis. This combination of IFNγ and IL1β was able to categorize subjects as having CRS or sepsis with 97% accuracy. As CAR-T therapies become more common, these data provide important novel information to better manage potential associated toxicities.
Enhanced risk stratification of patients with aortic stenosis (AS) is necessary to identify patients at high risk for adverse outcomes, and may allow for better management of patient subgroups at high risk of myocardial damage. The objective of this study was to identify plasma biomarkers and multimarker profiles associated with adverse outcomes in AS.
Biochemical thought posits that rate-limiting steps (defined here as points of flux control) are strongly selected as points of pathway regulation and control and are thus expected to be evolutionarily conserved. Conversely, population genetic thought based upon the concepts of mutation-selection-drift balance at the pathway level might suggest variation in flux controlling steps over evolutionary time. Glycolysis, as one of the most conserved and best characterized pathways, was studied to evaluate its evolutionary conservation. The flux controlling step in glycolysis was found to vary over the tree of life. Further, phylogenetic analysis suggested at least 60 events of gene duplication and additional events of putative positive selection that might alter pathway kinetic properties. Together, these results suggest that even with presumed largely negative selection on pathway output on glycolysis, the co-evolutionary process under the hood is dynamic.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.