Missing values are known to be problematic for the analysis of gas chromatography-mass spectrometry (GC-MS) metabolomics data. Typically these values cover about 10%–20% of all data and can originate from various backgrounds, including analytical, computational, as well as biological. Currently, the most well known substitute for missing values is a mean imputation. In fact, some researchers consider this aspect of data analysis in their metabolomics pipeline as so routine that they do not even mention using this replacement approach. However, this may have a significant influence on the data analysis output(s) and might be highly sensitive to the distribution of samples between different classes. Therefore, in this study we have analysed different substitutes of missing values namely: zero, mean, median, k-nearest neighbours (kNN) and random forest (RF) imputation, in terms of their influence on unsupervised and supervised learning and, thus, their impact on the final output(s) in terms of biological interpretation. These comparisons have been demonstrated both visually and computationally (classification rate) to support our findings. The results show that the selection of the replacement methods to impute missing values may have a considerable effect on the classification accuracy, if performed incorrectly this may negatively influence the biomarkers selected for an early disease diagnosis or identification of cancer related metabolites. In the case of GC-MS metabolomics data studied here our findings recommend that RF should be favored as an imputation of missing value over the other tested methods. This approach displayed excellent results in terms of classification rate for both supervised methods namely: principal components-linear discriminant analysis (PC-LDA) (98.02%) and partial least squares-discriminant analysis (PLS-DA) (97.96%) outperforming other imputation methods.
The origin of missing values can be caused by different reasons and depending on these origins missing values should be considered differently and dealt with in different ways. In this research, four methods of imputation have been compared with respect to revealing their effects on the normality and variance of data, on statistical significance and on the approximation of a suitable threshold to accept missing data as truly missing. Additionally, the effects of different strategies for controlling familywise error rate or false discovery and how they work with the different strategies for missing value imputation have been evaluated. Missing values were found to affect normality and variance of data and k-means nearest neighbour imputation was the best method tested for restoring this. Bonferroni correction was the best method for maximizing true positives and minimizing false positives and it was observed that as low as 40% missing data could be truly missing. The range between 40 and 70% missing values was defined as a "gray area" and therefore a strategy has been proposed that provides a balance between the optimal imputation strategy that was k-means nearest neighbor and the best approximation of positioning real zeros.
The type and use of quality control (QC) samples is a 'hot topic' in metabolomics. QCs are not novel in analytical chemistry; however since the evolution of using QCs to control the quality of data in large scale metabolomics studies (first described in 2011), the need for detailed knowledge of how to use QCs and the effects they can have on data treatment is growing. A controlled experiment has been designed to illustrate the most advantageous uses of QCs in metabolomics experiments. For this, samples were formed from a pool of plasma whereby different metabolites were spiked into two groups in order to simulate biological biomarkers. Three different QCs were compared: QCs pooled from all samples, QCs pooled from each experimental group of samples separately and QCs provided by an external source (QC surrogate). On the experimentation of different data treatment strategies, it was revealed that QCs collected separately for groups offers the closest matrix to the samples and improves the statistical outcome, especially for biomarkers unique to one group. A novel quality assurance plus procedure has also been proposed that builds on previously published methods and has the ability to improve statistical results for QC pool. For this dataset, the best option to work with QC surrogate was to filter data based only on group presence. Finally, a novel use of recursive analysis is portrayed that allows the improvement of statistical analyses with respect to the ratio between true and false positives.
IntroductionCellular metabolism is altered during cancer initiation and progression, which allows cancer cells to increase anabolic synthesis, avoid apoptosis and adapt to low nutrient and oxygen availability. The metabolic nature of cancer enables patient cancer status to be monitored by metabolomics and lipidomics. Additionally, monitoring metabolic status of patients or biological models can be used to greater understand the action of anticancer therapeutics.ObjectivesDiscuss how metabolomics and lipidomics can be used to (i) identify metabolic biomarkers of cancer and (ii) understand the mechanism-of-action of anticancer therapies. Discuss considerations that can maximize the clinical value of metabolic cancer biomarkers including case–control, prognostic and longitudinal study designs.MethodsA literature search of the current relevant primary research was performed.ResultsMetabolomics and lipidomics can identify metabolic signatures that associate with cancer diagnosis, prognosis and disease progression. Discriminatory metabolites were most commonly linked to lipid or energy metabolism. Case–control studies outnumbered prognostic and longitudinal approaches. Prognostic studies were able to correlate metabolic features with future cancer risk, whereas longitudinal studies were most effective for studying cancer progression. Metabolomics and lipidomics can help to understand the mechanism-of-action of anticancer therapeutics and mechanisms of drug resistance.ConclusionMetabolomics and lipidomics can be used to identify biomarkers associated with cancer and to better understand anticancer therapies.
Aqueous humor is the transparent fluid found in the anterior chamber of the eye that provides the metabolic requirements to the avascular tissues surrounding it. Despite the fact that metabolomics could be a powerful tool in the characterization of this biofluid and in revealing metabolic signatures of common ocular diseases such as myopia, it has never to our knowledge previously been applied in humans. In this research a novel method for the analysis of aqueous humor is presented to show its application in the characterization of this biofluid using CE-MS. The method was extended to a dual platform method (CE-MS and LC-MS) in order to compare samples from patients with different severities of myopia in order to explore the disease from the metabolic phenotype point of view. With this method, a profound knowledge of the metabolites present in human aqueous humor has been obtained: over 40 metabolites were reproducibly and simultaneously identified from a low volume of sample by CE-MS, including among others, a vast number of amino acids and derivatives. When this method was extended to study groups of patients with high or low myopia in both CE-MS and LC-MS, it has been possible to identify over 20 significantly different metabolite and lipid signatures that distinguish patients based on the severity of myopia. Among these, the most notable higher abundant metabolites in high myopia were aminooctanoic acid, arginine, citrulline and sphinganine while features of low myopia were aminoundecanoic acid, dihydro-retinoic acid and cysteinylglycine disulfide. This dual platform approach offered complementarity such that different metabolites were detected in each technique. Together the experiments presented provide a whelm of valuable information about human aqueous humor and myopia, proving the utility of non-targeted metabolomics for the first time in analyzing this type of sample and the metabolic phenotype of this disease.
Since the start of metabolomics as a field of research, the number of studies related to cancer has grown to such an extent that cancer metabolomics now represents its own discipline. In this chapter, the applications of metabolomics in cancer studies are explored. Different approaches and analytical platforms can be employed for the analysis of samples depending on the goal of the study and the aspects of the cancer metabolome being investigated. Analyses have concerned a range of cancers including lung, colorectal, bladder, breast, gastric, oesophageal and thyroid, amongst others. Developments in these strategies and methodologies that have been applied are discussed, in addition to exemplifying the use of cancer metabolomics in the discovery of biomarkers and in the assessment of therapy (both pharmaceutical and nutraceutical). Finally, the application of cancer metabolomics in personalised medicine is presented.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.