2010
DOI: 10.1007/978-3-642-00580-0_3
|View full text |Cite
|
Sign up to set email alerts
|

Classifying Very-High-Dimensional Data with Random Forests of Oblique Decision Trees

Abstract: Abstract. The random forests method is one of the most successful ensemble methods. However, random forests do not have high performance when dealing with very-high-dimensional data in presence of dependencies. In this case one can expect that there exist many combinations between the variables and unfortunately the usual random forests method does not effectively exploit this situation. We here investigate a new approach for supervised classification with a huge number of numerical attributes. We propose a ra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
38
0

Year Published

2014
2014
2018
2018

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 43 publications
(41 citation statements)
references
References 27 publications
3
38
0
Order By: Relevance
“…Random forest reduces bias (systematic error term independent of the training sample) as well as variance (error due to variability associated with the training sample) by creating unpruned trees thus keeping bias low, and uses randomization for controlling the diversity between trees in the ensemble [14]. Randomization is introduced into the ensemble by creating trees using bootstrap aggregation with replacement of samples, as well as for selecting variables that will be used for node splitting [15].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Random forest reduces bias (systematic error term independent of the training sample) as well as variance (error due to variability associated with the training sample) by creating unpruned trees thus keeping bias low, and uses randomization for controlling the diversity between trees in the ensemble [14]. Randomization is introduced into the ensemble by creating trees using bootstrap aggregation with replacement of samples, as well as for selecting variables that will be used for node splitting [15].…”
Section: Introductionmentioning
confidence: 99%
“…First, tree construction is based on a single feature being selected for node-splitting. Such trees may be inefficient in dealing with feature dependencies likely inherent in high dimensional spectral data [14]. Second, the majority of current implementations of the RF algorithm utilizes orthogonal splits based on univariate decision trees (DT).…”
Section: Introductionmentioning
confidence: 99%
“…Some algorithms (e.g. SVM; Distributed Hierarchical Decision Tree) can handle high dimensionality better than others (Bar-Or, Wolff, Schuster, & Keren, 2005;Do, Lenca, Lallich, & Pham, 2010). As was stated previously, in manufacturing mostly those ML algorithms are applicable that are capable of handling high-dimensional data.…”
Section: Advantages Of Machine Learning Application In Manufacturingmentioning
confidence: 87%
“…Regarding the fact that our problem deals with high-dimensional data, tree classifiers are not very suitable [47]. AdaBoost might overfit the training data in the presence of noise.…”
Section: Data Mining Techniquesmentioning
confidence: 99%