Relevant and Non-Redundant Feature Selection for Cancer Classification and Subtype Detection

Rana, Pratip; Thai, Phuc; Dinh, Thang N.; Ghosh, Preetam

doi:10.3390/cancers13174297

Cited by 14 publications

(11 citation statements)

References 32 publications

(39 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The expression of hundreds of transcripts is often cataloged by RNA-seq measurements, but most are redundant (i.e., strongly correlated) or noisy. In addition, the number of samples available is less than the number of features owing to the expenses associated with conducting experiments, which makes it simple for conventional machine learning and statistical algorithms to overfit the biological data [ 71 ].…”

Section: Discussionmentioning

confidence: 99%

Molecular Cluster Mining of Adrenocortical Carcinoma via Multi-Omics Data Analysis Aids Precise Clinical Therapy

Guan

Yue

Chen

et al. 2022

Cells

View full text Add to dashboard Cite

Adrenocortical carcinoma (ACC) is a malignancy of the endocrine system. We collected clinical and pathological features, genomic mutations, DNA methylation profiles, and mRNA, lncRNA, microRNA, and somatic mutations in ACC patients from the TCGA, GSE19750, GSE33371, and GSE49278 cohorts. Based on the MOVICS algorithm, the patients were divided into ACC1-3 subtypes by comprehensive multi-omics data analysis. We found that immune-related pathways were more activated, and drug metabolism pathways were enriched in ACC1 subtype patients. Furthermore, ACC1 patients were sensitive to PD-1 immunotherapy and had the lowest sensitivity to chemotherapeutic drugs. Patients with the ACC2 subtype had the worst survival prognosis and the highest tumor-mutation rate. Meanwhile, cell-cycle-related pathways, amino-acid-synthesis pathways, and immunosuppressive cells were enriched in ACC2 patients. Steroid and cholesterol biosynthetic pathways were enriched in patients with the ACC3 subtype. DNA-repair-related pathways were enriched in subtypes ACC2 and ACC3. The sensitivity of the ACC2 subtype to cisplatin, doxorubicin, gemcitabine, and etoposide was better than that of the other two subtypes. For 5-fluorouracil, there was no significant difference in sensitivity to paclitaxel between the three groups. A comprehensive analysis of multi-omics data will provide new clues for the prognosis and treatment of patients with ACC.

show abstract

Section: Discussionmentioning

confidence: 99%

Molecular Cluster Mining of Adrenocortical Carcinoma via Multi-Omics Data Analysis Aids Precise Clinical Therapy

Guan

Yue

Chen

et al. 2022

Cells

View full text Add to dashboard Cite

show abstract

“…Therefore, a technique is needed to obtain the least overlapping features. The best features are ideally free from redundancy with each other (Rana et al, 2021).…”

Section: Feature Selectionmentioning

confidence: 99%

“…In contrast with the previous two researchers in this paper, the author conducted multi features selection for multi objects applied to the digital image identification of beef and pork. The key feature is considered relevant when having minimum overlap (Rana et al, 2021). The fundamental problem for overlap feature is that this set of features do not comprehensively represent the value of a feature as a target as it still contains the values of other features.…”

Section: Introductionmentioning

confidence: 99%

“…The fundamental problem for overlap feature is that this set of features do not comprehensively represent the value of a feature as a target as it still contains the values of other features. The challenge encountered is how to choose one or several of those overlap features as candidates for the best features (Rana et al, 2021). The basic assumption is the smaller range of overlap value of a feature, the better the feature is chosen as the winner of the selection (key feature).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

The Identification of Beef and Pork Using Neural Network Based on Texture Features

Anwar¹,

Setyowibowo²

2022

JER is an international, peer-reviewed journal that publishes f

View full text Add to dashboard Cite

The actual problem that frequently happens related to meat sales at conventional markets is the manipulation of pork and beef. It can happen as both visual textures bear resemblances. Texture is a crucial part of an object. In image processing, textures can be used for classification, recognition or prediction of an image. This paper offers the Minimum Overlap Probability - Neural Network method for the identification of digital image features of pork and beef.. Minimum Overlap Probability was employed to select features of the strongest characteristics, whilst Neural Network is used for training and classification. Based on the test results, the strongest features are maximum probability, contrast, sum average, autocorrelation, and energy and entropy sum. Based on MOP-NN Model test result, the digital image identification of beef and pork has performance with an accuracy of 96% on 400 images of sample data.

show abstract

“…, the most differentially expressed genes between two classes of interest) and the “native” selection procedures of the most important features from some models, such as random forests, regression techniques with L 1 -regularization and many others ( Saeys, Inza & Larranaga, 2007 ; Chandrashekar & Sahin, 2014 ; Wang, Wang & Chang, 2016 ). Besides that, several approaches were designed specifically for classification problems involving cancer transcriptomics data, including gene ranking, filtration and combining the most relevant genes in a single model ( Arakelyan, Aslanyan & Boyajyan, 2013 ; Zhang et al, 2021 ; Rana et al, 2021 ).…”

Section: Introductionmentioning

confidence: 99%

ExhauFS: exhaustive search-based feature selection for classification and survival regression

Nersisyan

Novosad

Galatenko

et al. 2022

PeerJ

View full text Add to dashboard Cite

Feature selection is one of the main techniques used to prevent overfitting in machine learning applications. The most straightforward approach for feature selection is an exhaustive search: one can go over all possible feature combinations and pick up the model with the highest accuracy. This method together with its optimizations were actively used in biomedical research, however, publicly available implementation is missing. We present ExhauFS—the user-friendly command-line implementation of the exhaustive search approach for classification and survival regression. Aside from tool description, we included three application examples in the manuscript to comprehensively review the implemented functionality. First, we executed ExhauFS on a toy cervical cancer dataset to illustrate basic concepts. Then, multi-cohort microarray breast cancer datasets were used to construct gene signatures for 5-year recurrence classification. The vast majority of signatures constructed by ExhauFS passed 0.65 threshold of sensitivity and specificity on all datasets, including the validation one. Moreover, a number of gene signatures demonstrated reliable performance on independent RNA-seq dataset without any coefficient re-tuning, i.e., turned out to be cross-platform. Finally, Cox survival regression models were used to fit isomiR signatures for overall survival prediction for patients with colorectal cancer. Similarly to the previous example, the major part of models passed the pre-defined concordance index threshold 0.65 on all datasets. In both real-world scenarios (breast and colorectal cancer datasets), ExhauFS was benchmarked against state-of-the-art feature selection models, including L1-regularized sparse models. In case of breast cancer, we were unable to construct reliable cross-platform classifiers using alternative feature selection approaches. In case of colorectal cancer not a single model passed the same 0.65 threshold. Source codes and documentation of ExhauFS are available on GitHub: https://github.com/s-a-nersisyan/ExhauFS.

show abstract

Relevant and Non-Redundant Feature Selection for Cancer Classification and Subtype Detection

Cited by 14 publications

References 32 publications

Molecular Cluster Mining of Adrenocortical Carcinoma via Multi-Omics Data Analysis Aids Precise Clinical Therapy

Molecular Cluster Mining of Adrenocortical Carcinoma via Multi-Omics Data Analysis Aids Precise Clinical Therapy

The Identification of Beef and Pork Using Neural Network Based on Texture Features

ExhauFS: exhaustive search-based feature selection for classification and survival regression

Contact Info

Product

Resources

About