Background
Gene expression data are often used to classify cancer genes. In such high-dimensional datasets, however, only a few feature genes are closely related to tumors. Therefore, it is important to accurately select a subset of feature genes with high contributions to cancer classification.
Methods
In this article, a new three-stage hybrid gene selection method is proposed that combines a variance filter, extremely randomized tree and Harris Hawks (VEH). In the first stage, we evaluated each gene in the dataset through the variance filter and selected the feature genes that meet the variance threshold. In the second stage, we use extremely randomized tree to further eliminate irrelevant genes. Finally, we used the Harris Hawks algorithm to select the gene subset from the previous two stages to obtain the optimal feature gene subset.
Results
We evaluated the proposed method using three different classifiers on eight published microarray gene expression datasets. The results showed a 100% classification accuracy for VEH in gastric cancer, acute lymphoblastic leukemia and ovarian cancer, and an average classification accuracy of 95.33% across a variety of other cancers. Compared with other advanced feature selection algorithms, VEH has obvious advantages when measured by many evaluation criteria.
In biomedical data mining, the gene dimension is often much larger than the sample size. To solve this problem, we need to use a feature selection algorithm to select feature gene subsets with a strong correlation with phenotype to ensure the accuracy of subsequent analysis. This paper presents a new three-stage hybrid feature gene selection method, that combines a variance filter, extremely randomized tree, and whale optimization algorithm. First, a variance filter is used to reduce the dimension of the feature gene space, and an extremely randomized tree is used to further reduce the feature gene set. Finally, the whale optimization algorithm is used to select the optimal feature gene subset. We evaluate the proposed method with three different classifiers in seven published gene expression profile datasets and compare it with other advanced feature selection algorithms. The results show that the proposed method has significant advantages in a variety of evaluation indicators.
RNA modification is a key regulatory mechanism involved in tumorigenesis, tumor progression, and the immune response. However, the potential role of RNA modification “writer” genes in the immune microenvironment of gliomas and their effect on the response to immunotherapy remains unclear. The purpose of this study was to evaluate the role of RNA modification "writer" gene in the prognosis and immunotherapy response of low-grade glioma (LGG). The consensus non-negative matrix factorization (CNMF) method was used to identify different RNA modification subtypes. We used a novel eigengene screening method, the variable neighborhood learning Harris Hawks optimizer (VNLHHO), to screen for eigengenes among the RNA modification subtypes. We constructed a principal components analysis score(PCA_score)-based prognostic prediction model and validated it using an independent cohort. We also analyzed the association between PCA_score and the immune and molecular features of LGG. The results suggested that LGG can be divided into two different RNA modification-based subtypes with distinct prognostic and molecular features. High PCA_score was significantly associated with a poor prognosis in LGG and was an independent prognostic factor. A nomogram containing PCA_score and clinical features was constructed, and it showed a significant predictive value. PCA_score was negatively correlated with tumor purity and the abundance of CD4+ T cells in LGG patients. LGG patients with high PCA_score had lower Tumor Immune Dysfunction and Exclusion scores and showed an immunotherapy response. In conclusion, we report a novel RNA modification-based prognostic model for LGG that lays the foundation for evaluating LGG prognosis and developing more effective therapeutic strategies for these tumors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.