Nonsmall cell lung cancer is a prevalent disease. It is diagnosed and treated with the help of computed tomography (CT) scans. In this paper, we apply radiomics to select 3-D features from CT images of the lung toward providing prognostic information. Focusing on cases of the adenocarcinoma nonsmall cell lung cancer tumor subtype from a larger data set, we show that classifiers can be built to predict survival time. This is the first known result to make such predictions from CT scans of lung cancer. We compare classifiers and feature selection approaches. The best accuracy when predicting survival was 77.5% using a decision tree in a leave-one-out cross validation and was obtained after selecting five features per fold from 219.
INDEX TERMSComputed tomography, CT 3D texture features, support vector machine, Naive Bayes, decision tree.
Gene-expression microarray datasets often consist of a limited number of samples with a large number of gene-expression measurements, usually on the order of thousands. Therefore, dimensionality reduction is critical prior to any classification task. In this work, the iterative feature perturbation method (IFP), an embedded gene selector, is introduced and applied to four microarray cancer datasets: colon cancer, leukemia, Moffitt colon cancer, and lung cancer. We compare results obtained by IFP to those of support vector machine-recursive feature elimination (SVM-RFE) and the t-test as a feature filter using a linear support vector machine as the base classifier. Analysis of the intersection of gene sets selected by the three methods across the four datasets was done. Additional experiments included an initial pre-selection of the top 200 genes based on their p values. IFP and SVM-RFE were then applied on the reduced feature sets. These results showed up to 3.32% average performance improvement for IFP across the four datasets. A statistical analysis (using the Friedman/Holm test) for both scenarios showed the highest accuracies came from the t-test as a filter on experiments without gene pre-selection. IFP and SVM-RFE had greater classification accuracy after gene pre-selection. Analysis showed the t-test is a good gene selector for microarray data. IFP and SVM-RFE showed performance improvement on a reduced by t-test dataset. The IFP approach resulted in comparable or superior average class accuracy when compared to SVM-RFE on three of the four datasets. The same or similar accuracies can be obtained with different sets of genes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.