Cancer disease is accountable for many deaths that are over 9.6 million in 2018 and roughly one out of six deaths occur because of cancer worldwide. The colon cancer is the second prominent source of death of around 1.8 million cases. This research is inclined to detect the colon cancer from microarray dataset. It will aids the experts to distinguish the cancer cells from normal cells for appropriate determination and treatment of cancer at earlier stages that leads to increase the survival rate of the patients. The high dimensionality in microarray dataset with less samples and more attributes creates lag in the detection capability of the classifier. Hence there is a need for dimensionality reduction techniques to preserve the significant genes that are prominent in the disease classification. In this article, at first ANOVA method used to select the best genes and then principal component analysis (PCA) and fuzzy C‐means clustering (FCM) techniques are further employed to choose relevant genes. The PCA and FCM features are classified using model, discriminant, regression, hybrid, and heuristic‐based classifiers. The attained results show that the heuristic classifier with PCA features is encapsulated an average classification accuracy of 97.92% for classifying both the colon cancer and normal samples. Also, for FCM features, the Heuristic classifier is maintained at an average classification accuracy of 99.48% and 97.92% for classifying the colon cancer and normal samples, respectively. The Heuristic classifier outperforms with high accuracy than all other classifiers in the classification of colon cancer.
Microarray technology is a prominent tool that analyzes many thousands of gene expressions in a single experiment as well as to realize the primary genetic causes of various human diseases. There are abundant applications of this technology and its dataset is of high dimension and it is difficult to analyze the whole gene sets. In this paper, the SAM technique is used in a Golub microarray dataset which helps in identifying significant genes. Then the identified genes are clustered using three clustering techniques, namely, Hierarchical, k-means and Fuzzy C-means clustering algorithms. It helps in forming groups or clusters that share similar characteristics, which are useful when unknown dataset is used for analysis. From the results, it is shown that the hierarchical clustering performs well in exactly forming 27 samples in first cluster (ALL) and 11 samples in the second cluster (AML). They will provide an idea regarding the characteristics of the dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.