Molecular gene-expression datasets consist of samples with tens of thousands of measured quantities (e.g., high dimensional data). However, there exist lower-dimensional representations that retain the useful information. We present a novel algorithm for such dimensionality reduction called Pathway Activity Score Learning (PASL). The major novelty of PASL is that the constructed features directly correspond to known molecular pathways and can be interpreted as pathway activity scores. Hence, unlike PCA and similar methods, PASL’s latent space has a relatively straight-forward biological interpretation. As a use-case, PASL is applied on two collections of breast cancer and leukemia gene expression datasets. We show that PASL does retain the predictive information for disease classification on new, unseen datasets, as well as outperforming PLIER, a recently proposed competitive method. We also show that differential activation pathway analysis provides complementary information to standard gene set enrichment analysis. The code is available at https://github.com/mensxmachina/PASL.
Extra virgin olive oil (EVOO) is a key component of the Mediterranean diet, with several health benefits derived from its consumption. Moreover, due to its eminent market position, EVOO has been thoroughly studied over the last several years, aiming at its authentication, but also to reveal the chemical profile inherent to its beneficial properties. In the present work, a comparative study was conducted to assess Greek EVOOs’ quality and authentication utilizing different analytical approaches, both targeted and untargeted. 173 monovarietal EVOOs from three emblematic Greek cultivars (Koroneiki, Kolovi and Adramytiani), obtained during the harvesting years of 2018–2020, were analyzed and quantified as per their fatty acids methyl esters (FAMEs) composition via the official method (EEC) No 2568/91, as well as their bioactive content through liquid chromatography coupled to high resolution mass spectrometry (LC-HRMS) methodology. In addition to FAMEs analysis, EVOO samples were also analyzed via HRMS-untargeted metabolomics and optical spectroscopy techniques (visible absorption, fluorescence and Raman). The data retrieved from all applied techniques were analyzed with Machine Learning methods for the authentication of the EVOOs’ variety. The models’ predictive performance was calculated through test samples, while for further evaluation 30 commercially available EVOO samples were also examined in terms of variety. To the best of our knowledge, this is the first study where different techniques from the fields of standard analysis, spectrometry and optical spectroscopy are applied to the same EVOO samples, providing strong insight into EVOOs chemical profile and a comparative evaluation through the different platforms.
Molecular gene-expression datasets consist of samples with tens of thousands of measured quantities (i.e., high dimensional data). However, lower-dimensional representations that retain the useful biological information do exist. We present a novel algorithm for such dimensionality reduction called Pathway Activity Score Learning (PASL). The major novelty of PASL is that the constructed features directly correspond to known molecular pathways (genesets in general) and can be interpreted as pathway activity scores. Hence, unlike PCA and similar methods, PASL’s latent space has a fairly straightforward biological interpretation. PASL is shown to outperform in predictive performance the state-of-the-art method (PLIER) on two collections of breast cancer and leukemia gene expression datasets. PASL is also trained on a large corpus of 50000 gene expression samples to construct a universal dictionary of features across different tissues and pathologies. The dictionary validated on 35643 held-out samples for reconstruction error. It is then applied on 165 held-out datasets spanning a diverse range of diseases. The AutoML tool JADBio is employed to show that the predictive information in the PASL-created feature space is retained after the transformation. The code is available at https://github.com/mensxmachina/PASL.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.