An Effective Pre-Processing Phase for Gene Expression Classification

Seah, Choon Sen; Kasim, Shahreen; Fudzee, Mohd Farhan Md; Mohamad, Mohd Saberi; Saedudin, Rd. Rohmat; Hassan, Rohayanti; Ismail, Mohd Arfian; Atan, Rodziah

doi:10.11591/ijeecs.v11.i3.pp1223-1227

Cited by 4 publications

(8 citation statements)

References 18 publications

(19 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Run module in GenePattern 4. Output GCT files of gene expression datasets for further data pre-processing This raw gene expression data file contains abundant information extracted from the cell [18]. In order to generate a GCT file for data pre-processing, a ZIP package of CEL files downloaded from the database is created for the usage of GenePattern modules in the next step.…”

Section: Pre-analysismentioning

confidence: 99%

A Microarray Data Pre-processing Method for Cancer Classification

Hui

Kasim²,

Fudzee³

et al. 2022

JOIV : Int. J. Inform. Visualization

View full text Add to dashboard Cite

The development of microarray technology has led to significant improvements and research in various fields. With the help of machine learning techniques and statistical methods, it is now possible to organize, analyze, and interpret large amounts of biological data to uncover significant patterns of interest. The exploitation of microarray data is of great challenge for many researchers. Raw gene expression data are usually vulnerable to missing values, noisy data, incomplete data, and inconsistent data. Hence, processing data before being applied for cancer classification is important. In order to extract the biological significance of microarray gene expression data, data pre-processing is a necessary step to obtain valuable information for further analysis and address important hypotheses. This study presents a detailed description of pre-processing data method for cancer classification. The proposed method consists of three phases: data cleaning, transformation, and filtering. The combination of GenePattern software tool and Rstudio was utilized to implement the proposed data pre-processing method. The proposed method was applied to six gene expression datasets: lung cancer dataset, stomach cancer dataset, liver cancer dataset, kidney cancer dataset, thyroid cancer dataset, and breast cancer dataset to demonstrate the feasibility of the proposed method for cancer classification. A comparison has been made to illustrate the differences between the dataset before and after data pre-processing.

show abstract

Section: Pre-analysismentioning

confidence: 99%

A Microarray Data Pre-processing Method for Cancer Classification

Hui

Kasim²,

Fudzee³

et al. 2022

JOIV : Int. J. Inform. Visualization

View full text Add to dashboard Cite

show abstract

“…The removal of unrelated bone data in the dataset enhanced accuracy of prediction. Choon Sen Seah et.al (2018) [4] developed a pre-processing model called Significant Directed Random Walk (SDRW) in three stages. During the first stage, unwanted attributes were removed along with missing values and arrangement of data.…”

Section: Literature Reviews On Pre-processing Techniques For Lung Cancermentioning

confidence: 99%

“…The application of pre-processing techniques in numerical analysis were found to be missing among the models. • A pre-processing framework with sequence of stages were found earlier in Significant Directed Random Walk (SDRW) Choon Sen Seah et.al (2018) [4]. However, the stages were generally made with no specific algorithm generated in novel form.…”

Section: Research Gaps Of the Studymentioning

confidence: 99%

Augmentation of Predictive Competence of Non-Small Cell Lung Cancer Datasets through Feature Pre-Processing Techniques

Sumalatha

Parthiban

2022

EAI Endorsed Trans Perv Health Tech

View full text Add to dashboard Cite

The major Objective of the Study is to augment the predictive analytics of Non-Small Cell Lung Cancer (NSCLC) datasets with Feature Pre-Processing (FPP) technique in three stages viz. Remove base errors with common analytics on emptiness or non-numerical or missing values in the dataset, remove repeated features through regression analysis and eliminate irrelevant features through clustering methods. The FPP Model is validated using classifiers like simple and complex Tree, Linear and Gaussian SVM, Weighted KNN and Boosted Trees in terms of accuracy, sensitivity, specificity, kappa, positive and negative likelihood. The result showed that the NSCLC dataset formed after FPP outperformed the raw NSCLC dataset in all performance levels and showed good augmentation in predictive analytics of NSCLC datasets. The research proved that preprocessing is essential for better prediction of complex medical datasets.

show abstract

Section: B Attribute Selectionmentioning

confidence: 99%

“…This step is concern about to remove redundant attributes [7]. Attribute selection is very important in data mining task and producing a smaller set of attributes is also a challenging task for research to produce good classification result [7]. There are many attribute selection methods in WEKA tools, but for this research, we only used four methods, which are CfsSubsetEval [8], WrapperSubsetEval [9], GainRatioSubsetEval [10], and CorrelationAttributeEval [11].…”

Section: B Attribute Selectionmentioning

confidence: 99%

Analysis of Attribute Selection and Classification Algorithm Applied to Hepatitis Patients

Samsuddin

Shah

Saedudin

et al. 2019

International Journal on Advanced Science, Engineering and Information Technology

Self Cite

View full text Add to dashboard Cite

Data mining techniques are widely used in classification, attribute selection and prediction in the field of bioinformatics because it helps to discover meaningful new correlations, patterns and trends by sifting through large volume of data, using pattern recognition technologies as well as statistical and mathematical techniques. Hepatitis is one of the most important health problem in the world. Many studies have been performed in the diagnosis of hepatitis disease but medical diagnosis is quite difficult and visual task which is mostly done by doctors. Therefore, this research is conducted to analyse the attribute selection and classification algorithm that applied to hepatitis patients. In order to achieve goals, WEKA tool is used to conduct the experiment with different attribute selector and classification algorithm . Hepatitis dataset that are used is taken from UC Irvine repository. This research deals with various attribute selector namely CfsSubsetEval, WrapperSubsetEval, GainRatioSubsetEval and CorrelationAttributeEval. The classification algorithm that used in this research are NaiveBayesUpdatable, SMO, KStar, RandomTree and SimpleLogistic. The results of the classification model are time and accuracy. Finally, it concludes that the best attribute selector is CfsSubsetEval while the best classifier is given to SMO because SMO performance is better than other classification techniques for hepatitis patients.

show abstract

An Effective Pre-Processing Phase for Gene Expression Classification

Cited by 4 publications

References 18 publications

A Microarray Data Pre-processing Method for Cancer Classification

A Microarray Data Pre-processing Method for Cancer Classification

Augmentation of Predictive Competence of Non-Small Cell Lung Cancer Datasets through Feature Pre-Processing Techniques

Analysis of Attribute Selection and Classification Algorithm Applied to Hepatitis Patients

Contact Info

Product

Resources

About