Classifying Very-High-Dimensional Data with Random Forests of Oblique Decision Trees

Nghi, Thanh; Lenca, Philippe; Lallich, Stéphane; Pham, Nguyen-Khang

doi:10.1007/978-3-642-00580-0_3

Cited by 43 publications

(41 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Random forest reduces bias (systematic error term independent of the training sample) as well as variance (error due to variability associated with the training sample) by creating unpruned trees thus keeping bias low, and uses randomization for controlling the diversity between trees in the ensemble [14]. Randomization is introduced into the ensemble by creating trees using bootstrap aggregation with replacement of samples, as well as for selecting variables that will be used for node splitting [15].…”

Section: Introductionmentioning

confidence: 99%

“…First, tree construction is based on a single feature being selected for node-splitting. Such trees may be inefficient in dealing with feature dependencies likely inherent in high dimensional spectral data [14]. Second, the majority of current implementations of the RF algorithm utilizes orthogonal splits based on univariate decision trees (DT).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Investigating the Utility of Oblique Tree-Based Ensembles for the Classification of Hyperspectral Data

Poona

Niekerk

Ismail

2016

Sensors

View full text Add to dashboard Cite

Ensemble classifiers are being widely used for the classification of spectroscopic data. In this regard, the random forest (RF) ensemble has been successfully applied in an array of applications, and has proven to be robust in handling high dimensional data. More recently, several variants of the traditional RF algorithm including rotation forest (rotF) and oblique random forest (oRF) have been applied to classifying high dimensional data. In this study we compare the traditional RF, rotF, and oRF (using three different splitting rules, i.e., ridge regression, partial least squares, and support vector machine) for the classification of healthy and infected Pinus radiata seedlings using high dimensional spectroscopic data. We further test the robustness of these five ensemble classifiers to reduced spectral resolution by spectral resampling (binning) of the original spectral bands. The results showed that the three oblique random forest ensembles outperformed both the traditional RF and rotF ensembles. Additionally, the rotF ensemble proved to be the least robust of the five ensembles tested. Spectral resampling of the original bands provided mixed results. Nevertheless, the results demonstrate that using spectral resampled bands is a promising approach to classifying asymptomatic stress in Pinus radiata seedlings.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Investigating the Utility of Oblique Tree-Based Ensembles for the Classification of Hyperspectral Data

Poona

Niekerk

Ismail

2016

Sensors

View full text Add to dashboard Cite

show abstract

“…Some algorithms (e.g. SVM; Distributed Hierarchical Decision Tree) can handle high dimensionality better than others (Bar-Or, Wolff, Schuster, & Keren, 2005;Do, Lenca, Lallich, & Pham, 2010). As was stated previously, in manufacturing mostly those ML algorithms are applicable that are capable of handling high-dimensional data.…”

Section: Advantages Of Machine Learning Application In Manufacturingmentioning

confidence: 87%

Machine learning in manufacturing: advantages, challenges, and applications

Wuest

Weimer²,

Irgens

et al. 2016

Production & Manufacturing Research

769

438

View full text Add to dashboard Cite

The nature of manufacturing systems faces ever more complex, dynamic and at times even chaotic behaviors. In order to being able to satisfy the demand for high-quality products in an efficient manner, it is essential to utilize all means available. One area, which saw fast pace developments in terms of not only promising results but also usability, is machine learning. Promising an answer to many of the old and new challenges of manufacturing, machine learning is widely discussed by researchers and practitioners alike. However, the field is very broad and even confusing which presents a challenge and a barrier hindering wide application. Here, this paper contributes in presenting an overview of available machine learning techniques and structuring this rather complicated area. A special focus is laid on the potential benefit, and examples of successful applications in a manufacturing environment.

show abstract

“…Regarding the fact that our problem deals with high-dimensional data, tree classifiers are not very suitable [47]. AdaBoost might overfit the training data in the presence of noise.…”

Section: Data Mining Techniquesmentioning

confidence: 99%

Loyal to your city? A data mining analysis of a public service loyalty program

Cnudde

Martens

2015

Decision Support Systems

View full text Add to dashboard Cite

Customer loyalty programs are largely present in the private sector and have been elaborately studied. Applications from the private sector have found resonance in a public setting, however, simply extrapolating research results is not acceptable, as their rationale inherently differs. This study focuses on data from a loyalty program issued by the city of Antwerp (Belgium). The aim of the loyalty card entails large citizen participation, however, an active user base of only 20 % is reached. Predictive techniques are employed to increase this number. Using spatial behavioral user information, a Naive Bayes classifier and a Support Vector Machine are used which result in models capable of predicting whether a user will actively use its card, whether a user will defect in the near future and which locations a user will visit. Also, a projection of spatial behavioral data onto even more finegrained spatio-temporal data is performed. The results are promising: the best model achieves an AUC value of 92.5 %, 85.5 % and 88.12 % (averaged over five locations) for the predictions, respectively. Moreover, as behavior is modeled in more detail, better predictions are made. Two main contributions are made in this study. First, as a theoretical contribution, fine-grained behavioral data contributes to a more sound decision-making process. Second, as a practical contribution, the city of Antwerp can now make tailored strategic decisions to increase its active user base.

show abstract

Classifying Very-High-Dimensional Data with Random Forests of Oblique Decision Trees

Cited by 43 publications

References 27 publications

Investigating the Utility of Oblique Tree-Based Ensembles for the Classification of Hyperspectral Data

Investigating the Utility of Oblique Tree-Based Ensembles for the Classification of Hyperspectral Data

Machine learning in manufacturing: advantages, challenges, and applications

Loyal to your city? A data mining analysis of a public service loyalty program

Contact Info

Product

Resources

About