Cancer is one of the most deadly diseases in the world. The International Agency for Research on Cancer (IARC) noted 14.1 million new cancer cases and 8.2 million deaths from cancer in 2012. In the last few years, DNA microarray technology has increasingly been used to analyze and diagnose cancer. Analysis of gene expression data in the form of microarray allows medical experts to ascertain whether or not a person suffers from cancer. DNA microarray data has a large dimension that can affect the process and accuracy of cancer classification. Therefore, a classification scheme that includes dimension reduction is needed. In this research, a Principal Component Analysis (PCA) dimension reduction method that includes the calculation of variance proportion for eigenvector selection was used. For the classification method, a Support Vector Machine (SVM) and Levenberg-Marquardt Backpropagation (LMBP) algorithm were selected. Based on the tests performed, the classification method using LMBP was more stable than SVM. The LMBP method achieved an average 96.07% accuracy, while the SVM achieved 94.98% accuracy.
News is a source of information disseminated in various types of media. In order to make it easier for news readers to obtain the desired news, the news needs to be classified. The large number of scattered news creates difficulties in classifying the news based on the topic. Therefore the author conducted a study to classify news into 12 classes (culture, economy, entertainment, law, health, life, automotive, education, politics, sports, technology, and tourism) automatically against 360 Indonesian news data. In this study several test scenarios were conducted to see the effect of stopword removal and stemming methods on data preprocessing, the effect of mutual information in selecting features, and performance of Support Vector Machine in classifying news data. The test results showed that the data using only stemming without stopword removal, using the MI selection feature and SVM classification method produced the best results of 94.24%, compared to the other methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.