Lilikoi V2.0: a deep learning–enabled, personalized pathway-based R package for diagnosis and prognosis predictions using metabolomics data

Fang, Xinying; Liu, Yu; Ren, Zhijie; Du, Yuheng; Huang, Qizhao; Garmire, Lana X.

doi:10.1093/gigascience/giaa162

Cited by 15 publications

(14 citation statements)

References 36 publications

(36 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this work we have focused on ORA, but many other PA methods exist [1,34]. While functional class scoring and topology-based methods can overcome certain limitations associated with ORA, such as the need to select compounds of interest, or not taking metabolite-level statistics into account, many of our findings are also relevant to these other methods.…”

Section: Discussionmentioning

confidence: 99%

Pathway analysis in metabolomics: pitfalls and best practice for the use of over-representation analysis

Wieder

Frainay

Rodríguez-Mier

et al. 2021

Preprint

View full text Add to dashboard Cite

Over-representation analysis (ORA) is one of the commonest pathway analysis approaches used for the functional interpretation of metabolomics datasets. Despite the widespread use of ORA in metabolomics, the community lacks guidelines detailing its best-practice use. Many factors have a pronounced impact on the results, but to date their effects have received little systematic attention in the field. We developed in-silico simulations using five publicly available datasets and illustrated that changes in parameters, such as the background set, differential metabolite selection methods, and pathway database choice, could all lead to profoundly different ORA results. The use of a non-assay-specific background set, for example, resulted in large numbers of false-positive pathways. Pathway database choice, evaluated using three of the most popular metabolic pathway databases: KEGG, Reactome, and BioCyc, led to vastly different results in both the number and function of significantly enriched pathways. Metabolomics data specific factors, such as reliability of compound identification and assay chemical bias also impacted ORA results. Simulated metabolite misidentification rates as low as 4% resulted in both gain of false-positive pathways and loss of truly significant pathways across all datasets. Our results have several practical implications for ORA users, as well as those using alternative pathway analysis methods. We offer a set of recommendations for the use of ORA in metabolomics, alongside a set of minimal reporting guidelines, as a first step towards the standardisation of pathway analysis in metabolomics.

show abstract

Section: Discussionmentioning

confidence: 99%

Pathway analysis in metabolomics: pitfalls and best practice for the use of over-representation analysis

Wieder

Frainay

Rodríguez-Mier

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…In this work we have focused on ORA, but many other PA methods exist [1,38,39]. While functional class scoring and topology-based methods can overcome certain limitations associated with ORA, such as the need to select compounds of interest, or not taking metabolitelevel statistics into account, many of our findings are also relevant to these methods.…”

Section: Plos Computational Biologymentioning

confidence: 99%

Pathway analysis in metabolomics: Recommendations for the use of over-representation analysis

et al. 2021

View full text Add to dashboard Cite

Over-representation analysis (ORA) is one of the commonest pathway analysis approaches used for the functional interpretation of metabolomics datasets. Despite the widespread use of ORA in metabolomics, the community lacks guidelines detailing its best-practice use. Many factors have a pronounced impact on the results, but to date their effects have received little systematic attention. Using five publicly available datasets, we demonstrated that changes in parameters such as the background set, differential metabolite selection methods, and pathway database used can result in profoundly different ORA results. The use of a non-assay-specific background set, for example, resulted in large numbers of false-positive pathways. Pathway database choice, evaluated using three of the most popular metabolic pathway databases (KEGG, Reactome, and BioCyc), led to vastly different results in both the number and function of significantly enriched pathways. Factors that are specific to metabolomics data, such as the reliability of compound identification and the chemical bias of different analytical platforms also impacted ORA results. Simulated metabolite misidentification rates as low as 4% resulted in both gain of false-positive pathways and loss of truly significant pathways across all datasets. Our results have several practical implications for ORA users, as well as those using alternative pathway analysis methods. We offer a set of recommendations for the use of ORA in metabolomics, alongside a set of minimal reporting guidelines, as a first step towards the standardisation of pathway analysis in metabolomics.

show abstract

“…We utilized the Lilikoi package [23] to determine the best machine learning model for classifying preterm and control samples using selected metabolites. Seven algorithms were compared in this step: recursive partitioning and regression trees (RPART), partition around medoids (PAM), gradient boosting (GBM), logistic regression with elastic net regularization (LOG), random forest (RF), support vector machine (SVM), and linear discriminant analysis (LDA).…”

Section: The Model Of Classificationmentioning

confidence: 99%

“…We used the query lipid as the input to map metabolites to pathways from HMDB, PubChem, and KEGG in Lilikoi [23,24]. These metabolite-pathway interactions were then used for the further pathways analysis.…”

Section: The Mapping Of Metabolite-related Pathway and Phenotypementioning

confidence: 99%

See 1 more Smart Citation

Maternal plasma lipids are involved in the pathogenesis of preterm birth

Chen

Liu

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Background: Preterm birth is defined by the onset of labor at a gestational age shorter than 37 weeks and it can lead to premature birth and impose a threat to newborns health. The Puerto Rico PROTECT cohort is a well-characterized prospective birth cohort that was designed to investigate environmental and social contributors to preterm birth in Puerto Rico, where preterm birth rates have been elevated in recent decades. To elucidate possible relationships between metabolites and preterm birth in this cohort, we conducted a nested case-control study to conduct untargeted metabolomic characterization of maternal plasma of 31 preterm birth women and 69 full-term labor controls at 24-28 gestational weeks. Results: A total of 333 metabolites were identified and annotated with liquid chromatography/mass spectrometry. Subsequent weighted gene correlation network analysis shows the fatty acid and carene enriched module has a significant positive association (p-value=8e-04) with preterm birth. After controlling for potential clinical confounders, a total of 38 metabolites demonstrated significant changes uniquely associated with preterm birth, where 17 of them were preterm biomarkers. Among seven machine-learning classifiers, application of random forest achieved the highly accurate and specific prediction (AUC = 0.92) for preterm birth in testing data, demonstrating their strong potential as biomarkers for preterm births. The 17 preterm biomarkers are involved in cell signaling, lipid metabolism, and lipid peroxidation functions. Further causality analysis infers that suberic acid upregulates several fatty acids to promote preterm birth. Conclusions: Altogether, this study demonstrates the involvement of lipids, particularly fatty acids, in the pathogenesis of preterm birth.

show abstract

Lilikoi V2.0: a deep learning–enabled, personalized pathway-based R package for diagnosis and prognosis predictions using metabolomics data

Cited by 15 publications

References 36 publications

Pathway analysis in metabolomics: pitfalls and best practice for the use of over-representation analysis

Pathway analysis in metabolomics: pitfalls and best practice for the use of over-representation analysis

Pathway analysis in metabolomics: Recommendations for the use of over-representation analysis

Maternal plasma lipids are involved in the pathogenesis of preterm birth

Contact Info

Product

Resources

About