Random forest for gene selection and microarray data classification

Moorthy, Kohbalan; Mohamad, Mohd Saberi

doi:10.6026/97320630007142

Cited by 47 publications

(16 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…However, using RainDance Microdroplet PCR as previously described, this limitation has been overcome [ 14 , 15 ]. In addition, the UroMark assay uses a random forest classifier which analyses the methylation status for each of 150 loci [ 22 ]. The classifier does not rely on single or low number of positive markers, or, a predefined pattern of methylation across a set of markers and a dichotomous output is derived from a cut off generated from the known outcomes of prior samples.…”

Section: Discussionmentioning

confidence: 99%

DETECT I & DETECT II: a study protocol for a prospective multicentre observational study to validate the UroMark assay for the detection of bladder cancer from urinary cells

et al. 2017

View full text Add to dashboard Cite

BackgroundHaematuria is a common finding in general practice which requires visual inspection of the bladder by cystoscopy as well as upper tract imaging. In addition, patients with non-muscle invasive bladder cancer (NMIBC) often require surveillance cystoscopy as often as three monthly depending on disease risk. However, cystoscopy is an invasive procedure which is uncomfortable, requires hospital attendance and is associated with a risk of urinary tract infection. We have developed the UroMark assay, which can detect 150 methylation specific alteration specific to bladder cancer using DNA from urinary sediment cells.MethodsDETECT I and DETECT II are two multi-centre prospective observational studies designed to conduct a robust validation of the UroMark assay. DETECT I will recruit patients having diagnostic investigations for haematuria to determine the negative predictive value of the UroMark to rule out the presence of bladder cancer. DETECT II will recruit patients with new or recurrent bladder cancer to determine the sensitivity of the UroMark in detecting low, intermediate and high grade bladder cancer. NMIBC patients in DETECT II will be followed up with three monthly urine sample collection for 24 months while having surveillance cystoscopy. DETECT II will include a qualitative analysis of semi-structured interviews to explore patients’ experience of being diagnosed with bladder cancer and having cystoscopy and a urinary test for bladder cancer surveillance. Results of the UroMark will be compared to cystoscopy findings and histopathological results in patients with bladder cancer.DiscussionA sensitive and specific urinary biomarker will revolutionise the haematuria diagnostic pathway and surveillance strategies for NMIBC patients. None of the six approved US Food and Drug Administration urinary test are recommended as a standalone test. The UroMark assay is based on next generation sequencing technology which interrogates 150 loci and represents a step change compared to other biomarker panels. This enhances the sensitivity of the test and by using a random forest classifier approach, where the UroMark results are derived from a cut off generated from known outcomes of previous samples, addresses many shortcomings of previous assays.Trial registrationBoth trails are registered on clinicaltrials.gov. DETECT I: NCT02676180 (18th December 2015). DETECT II: NCT02781428 (11th May 2016).

show abstract

Section: Discussionmentioning

confidence: 99%

DETECT I & DETECT II: a study protocol for a prospective multicentre observational study to validate the UroMark assay for the detection of bladder cancer from urinary cells

et al. 2017

View full text Add to dashboard Cite

show abstract

“…Due to its implementation simplicity and classification effectiveness, KNN has been widely used in pattern recognition. It is also used as a different feature selection algorithm [ 50 , 51 ] and is integrated into the feature selection framework to evaluate the quality of a candidate feature subset [ 52 – 54 ].…”

Section: Methods and Techniquesmentioning

confidence: 99%

Feature Extraction and Classification on Esophageal X-Ray Images of Xinjiang Kazak Nationality

Yang

Hamit

Yan

et al. 2017

Journal of Healthcare Engineering

View full text Add to dashboard Cite

Esophageal cancer is one of the fastest rising types of cancers in China. The Kazak nationality is the highest-risk group in Xinjiang. In this work, an effective computer-aided diagnostic system is developed to assist physicians in interpreting digital X-ray image features and improving the quality of diagnosis. The modules of the proposed system include image preprocessing, feature extraction, feature selection, image classification, and performance evaluation. 300 original esophageal X-ray images were resized to a region of interest and then enhanced by the median filter and histogram equalization method. 37 features from textural, frequency, and complexity domains were extracted. Both sequential forward selection and principal component analysis methods were employed to select the discriminative features for classification. Then, support vector machine and K-nearest neighbors were applied to classify the esophageal cancer images with respect to their specific types. The classification performance was evaluated in terms of the area under the receiver operating characteristic curve, accuracy, precision, and recall, respectively. Experimental results show that the classification performance of the proposed system outperforms the conventional visual inspection approaches in terms of diagnostic quality and processing time. Therefore, the proposed computer-aided diagnostic system is promising for the diagnostics of esophageal cancer.

show abstract

“…Random Forest (RF) is a type of machine-learning method, which has been experimentally proven to be the best classifier (10). RF has a number of advantages and has already been successfully applied to microarray data classification (11,12) and numerous other disease classifications (13,14). Among the different variable selection methods, variable selection using RF (VSURF) has demonstrated the best predictive performance thus far (15).…”

Section: Introductionmentioning

confidence: 99%

Predicting prognosis of endometrioid endometrial adenocarcinoma on the basis of gene expression and clinical features using Random Forest

et al. 2019

View full text Add to dashboard Cite

Traditional clinical features are not sufficient to accurately judge the prognosis of endometrioid endometrial adenocarcinoma (EEA). Molecular biological characteristics and traditional clinical features are particularly important in the prognosis of EEA. The aim of the present study was to establish a predictive model that considers genes and clinical features for the prognosis of EEA. The clinical and RNA sequencing expression data of EEA were derived from samples from The Cancer Genome Atlas (TCGA) and Peking University People's Hospital (PKUPH; Beijing, China). Samples from TCGA were used as the training set, and samples from the PKUPH were used as the testing set. Variable selection using Random Forests (VSURF) was used to select the genes and clinical features on the basis of TCGA samples. The RF classification method was used to establish the prediction model. Kaplan-Meier curves were tested with the log-rank test. The results from this study demonstrated that on the basis of TCGA samples, 11 genes and the grade were selected as the input features. In the training set, the out-of-bag (OOB) error of RF model-1, which was established using the ‘11 genes’, was 0.15; the OOB error of RF model-2, which was established using the ‘grade’, was 0.39; and the OOB error of RF model-3, established using the ‘11 genes and grade’, was 0.15. In the testing set, the classification accuracy of RF model-1, model-2 and model-3 was 71.43, 66.67 and 80.95%, respectively. In conclusion, to the best of our knowledge, the VSURF was used to select features relevant to EEA prognosis, and an EEA predictive model combining genes and traditional features was established for the first time in the present study. The prediction accuracy of the RF model on the basis of the 11 genes and grade was markedly higher than that of the RF models established by either the 11 genes or grade alone.

show abstract

Random forest for gene selection and microarray data classification

Cited by 47 publications

References 16 publications

DETECT I & DETECT II: a study protocol for a prospective multicentre observational study to validate the UroMark assay for the detection of bladder cancer from urinary cells

DETECT I & DETECT II: a study protocol for a prospective multicentre observational study to validate the UroMark assay for the detection of bladder cancer from urinary cells

Feature Extraction and Classification on Esophageal X-Ray Images of Xinjiang Kazak Nationality

Predicting prognosis of endometrioid endometrial adenocarcinoma on the basis of gene expression and clinical features using Random Forest

Contact Info

Product

Resources

About