2013 12th International Conference on Machine Learning and Applications 2013
DOI: 10.1109/icmla.2013.34
|View full text |Cite
|
Sign up to set email alerts
|

Random Forest with 200 Selected Features: An Optimal Model for Bioinformatics Research

Abstract: Many problems in bioinformatics involve highdimensional, difficult-to-process collections of data. For example, gene microarrays can record the expression levels of thousands of genes, many of which have no relevance to the underlying medical or biological question. Building classification models on such datasets can thus take excessive computational time and still give poor results. Many strategies exist to combat these problems, including feature selection (which chooses only the most relevant genes for buil… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(1 citation statement)
references
References 14 publications
0
1
0
Order By: Relevance
“…We note that tree-based and lasso approaches have been demonstrated to have appealing features that allow handling the analytical challenges of the chemical mixture data as presented in the workshop, and these methods are also increasingly employed in the field of high dimensional data analysis [20][21][22][23][24][25]. Motivated by a recent report which used combined CART and variable selection methods to analyze multiple pollutants and their interactions [3], we propose to use an improved two-step procedure of combining the random forest (RF) [25,26] with adaptive lasso [27] approaches.…”
Section: Review Of Existing Statistical Approaches and Their Limitationsmentioning
confidence: 99%
“…We note that tree-based and lasso approaches have been demonstrated to have appealing features that allow handling the analytical challenges of the chemical mixture data as presented in the workshop, and these methods are also increasingly employed in the field of high dimensional data analysis [20][21][22][23][24][25]. Motivated by a recent report which used combined CART and variable selection methods to analyze multiple pollutants and their interactions [3], we propose to use an improved two-step procedure of combining the random forest (RF) [25,26] with adaptive lasso [27] approaches.…”
Section: Review Of Existing Statistical Approaches and Their Limitationsmentioning
confidence: 99%