Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classification

Gao, Lingyun; Ye, Mingquan; Lu, Xiaojie; Huang, Daobin

doi:10.1016/j.gpb.2017.08.002

Cited by 106 publications

(65 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…IG is a filter method that eliminates irrelevant attributes in high‐dimensional data, whereas the SVM wrapper eliminates redundancy to decrease noise in the data. The IG‐SVM method has previously shown success for biomarker selection in high‐dimensional cancer gene data …”

Section: Discussionmentioning

confidence: 99%

Development of a Plasma Screening Panel for Pediatric Nonalcoholic Fatty Liver Disease Using Metabolomics

et al. 2019

View full text Add to dashboard Cite

Nonalcoholic fatty liver disease (NAFLD) is the most common chronic liver disease in children, but diagnosis is challenging due to limited availability of noninvasive biomarkers. Machine learning applied to high‐resolution metabolomics and clinical phenotype data offers a novel framework for developing a NAFLD screening panel in youth. Here, untargeted metabolomics by liquid chromatography–mass spectrometry was performed on plasma samples from a combined cross‐sectional sample of children and adolescents ages 2‐25 years old with NAFLD (n = 222) and without NAFLD (n = 337), confirmed by liver biopsy or magnetic resonance imaging. Anthropometrics, blood lipids, liver enzymes, and glucose and insulin metabolism were also assessed. A machine learning approach was applied to the metabolomics and clinical phenotype data sets, which were split into training and test sets, and included dimension reduction, feature selection, and classification model development. The selected metabolite features were the amino acids serine, leucine/isoleucine, and tryptophan; three putatively annotated compounds (dihydrothymine and two phospholipids); and two unknowns. The selected clinical phenotype variables were waist circumference, whole‐body insulin sensitivity index (WBISI) based on the oral glucose tolerance test, and blood triglycerides. The highest performing classification model was random forest, which had an area under the receiver operating characteristic curve (AUROC) of 0.94, sensitivity of 73%, and specificity of 97% for detecting NAFLD cases. A second classification model was developed using the homeostasis model assessment of insulin resistance substituted for the WBISI. Similarly, the highest performing classification model was random forest, which had an AUROC of 0.92, sensitivity of 73%, and specificity of 94%. Conclusion: The identified screening panel consisting of both metabolomics and clinical features has promising potential for screening for NAFLD in youth. Further development of this panel and independent validation testing in other cohorts are warranted.

show abstract

Section: Discussionmentioning

confidence: 99%

Development of a Plasma Screening Panel for Pediatric Nonalcoholic Fatty Liver Disease Using Metabolomics

et al. 2019

View full text Add to dashboard Cite

show abstract

“…Though, Leukemia3 and Colon cancer classification performances are a bit lower compared to that of other three datasets, they are still capable to be classified with only one and two misclassifications respectively. With comparison to the study reported in [4,12], the proposed study has obtained little bit higher accuracy which is 90.47% for colon cancer dataset whereas it is 90.32% with 3 genes in former and 90.09% with 30 genes in later. Further, in the classification of colon cancer, a sparse representation based method is proposed in [3] which provide 91.94% accuracy; nevertheless with a very huge gene subset.…”

Section: B Experimental Resultsmentioning

confidence: 58%

“…The number of informative genes selected by wrapper followed by each filter is given in the parenthesis. That is, IG-EA (12) indicates that the number of elements in the gene subset selected by IG-EA for Lymphoma dataset is 12. The classification performance without gene selection and the performance with baseline classifier (i.e.…”

Section: B Experimental Resultsmentioning

confidence: 99%

“…Consequently, they [11] suggested the signal to noise ratio feature selection method with K Nearest Neighbors classifier for feature selection. Comparatively wrappers and filters are used simply with good performance [2,8,[11][12][13][14][15][16][17]. Yet, some studies show lack of performance due to direct application of wrappers into the original datasets [2].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

The Effect of Evolutionary Algorithm in Gene Subset Selection for Cancer Classification

Fajila¹,

Jahan²

2018

IJMECS

View full text Add to dashboard Cite

The fact that reflects the cancer research consequences shows that still there are improvements that should be investigated in the stream of cancer in future. This leads the researchers to actively involve further in cancer research field. As an invention, a hybrid machine learning method is proposed in this study where two filters are assessed along with a wrapper approach. Typically, filters prioritize the features while, wrappers contribute in subset identification. Though both filters and wrappers exist independently, the excellent results they produce when applied subsequently. The wrapperfilter combination plays a major role in feature selection. Yet, incorporating with a best strategy for feature space analysis is crucial in this concern. Thus, we introduce the Evolutionary Algorithm in the proposed study to search through the feature space for informative gene subset selection. Though there are several gene selection approaches for cancer classification, many of them suffer from law classification accuracy and huge gene subset for prediction. Hence, we propose Evolutionary Algorithm to overcome this problem. The proposed approach is evaluated on five microarray datasets, where three out of them provide 100% accuracy. Regardless the number of genes selected, both filters provide the same performance throughout the datasets used. As a consequence, the Evolutionary Algorithm in feature space search is highlighted for its performance in gene subset selection.

show abstract

“…The day by day increment of cancer disease posing a serious threat to human health. The identification of the cancerous cell in the initial stage is still a challenging task, because of that the patients are diagnosed with cancer in advance stage, that increases the difficulty in the treatment of cancer [1]. Microarray is an on-chip technology, which contains the gene expression.…”

Section: Introductionmentioning

confidence: 99%

Classification of Microarray Data Using Kernel Based Classifiers

Swati¹,

Kumar²,

Mishra³

2019

RIA

View full text Add to dashboard Cite

Microarray dataset enables scientists to genotype thousands of loci at a time, making it easier to determine the association between chromosomal regions and particular diseases. This paper mainly compares the performance of different classifers on microarray data. Firstly, the expressed genes related to ovarian cancer were identified through a statistical test. Next, various classifiers, namely, Extreme Learning Machine (ELM) and Relevance Vector Machine (RVM), were applied to categorize the datasets and samples into malignant or benign classes. Then, the performance of each classifier was measured by precision, recall, specificity, etc. The results show that the ELM and the RVM are better classifiers in comparison to the support vector machine (SVM). The research results lay the basis for the application of kernel-based classifiers in cancer identification.

show abstract

Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classification

Cited by 106 publications

References 43 publications

Development of a Plasma Screening Panel for Pediatric Nonalcoholic Fatty Liver Disease Using Metabolomics

Development of a Plasma Screening Panel for Pediatric Nonalcoholic Fatty Liver Disease Using Metabolomics

The Effect of Evolutionary Algorithm in Gene Subset Selection for Cancer Classification

Classification of Microarray Data Using Kernel Based Classifiers

Contact Info

Product

Resources

About