Curse of dimensionality" is the contradiction between mathematical requirement for optimal gene signatures to contain no more than 20-30 genes [5] and biological reality observing10-100 times more differentially expressed genes in cancer tumors. Such short optimal signature are the consequence of the small number of patient samples available in a training set for signature calculation compared to the number of correlated DE genes [6,7]. The shortage of cancer samples for large signature calculation is so significant that even 10-fold increase in the number of available samples will not yield substantial improvement in predictability of transcriptional signatures.Biological considerations can provide solution to the "curse of dimensionality" of microarray data. Indeed, observed transcriptional profile in a patient is due to the activity of transcription factors and micro RNAs. The number of these direct transcriptional regulators is much smaller than the number of genes on the microarray and in human genome. Thus, the transformation of transcriptional profile into activity of few upstream expression regulators should provide significant reduction in the data space dimensionality which in turn should help calculating more powerful signatures [8]. Two similar algorithms were developed to calculate the activity of upstream expression regulators from microarray data using prior knowledge about expression regulatory events reported in the literature: sub-network enrichment analysis (SNEA) [9] and reverse causal reasoning (RCR) [10]. We used SNEA algorithm implemented in Pathway Studio software from Elsevier. It relies on the knowledge base of expression regulation events automatically extracted from biomedical research literature by natural processing technology. Pathway Studio knowledge base has the biggest number of regulatory events and therefore provides the most comprehensive and up-todate snapshot of transcriptional activity in cancer samples. SNEA uses non-parametric Mann-Whitney enrichment test to evaluate transcriptional activity of upstream regulators which was shown to provide superior results for microarray data analysis over overlap hyper geometric test implemented in RCR [11,12].The activity of upstream expression regulators in turn depends on activity of pathways altered in the tumor. Therefore projecting the activity of upstream expression regulators identified by SNEA onto collection of relatively small number of biological pathways relevant for cancer progression should allow us to identify cancer mechanism in an individual patient reducing the complexity in interpretation of large number of differentially expressed genes in
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.