Random Forests for Big Data

Genuer, Robin; Poggi, Jean‐Michel; Tuleau-Malot, Christine; Villa-Vialaneix, Nathalie

doi:10.1016/j.bdr.2017.07.003

Cited by 269 publications

(137 citation statements)

References 35 publications

(61 reference statements)

Supporting

Mentioning

116

Contrasting

Unclassified

Order By: Relevance

“…Addressing global sparsity is a challenge in decision trees and, to the best of our knowledge, this has not been tackled appropriately in the literature. Standard CARTs or Random Forests (RFs) [5,7,13,16] cannot manage it due to the greedy construction of the trees. Nonetheless, some attempts have been made, see [11,12].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Sparsity in optimal randomized classification trees

Blanquero

Carrizosa

Molero-Río

et al. 2020

European Journal of Operational Research

View full text Add to dashboard Cite

Decision trees are popular Classification and Regression tools and, when small-sized, easy to interpret. Traditionally, a greedy approach has been used to build the trees, yielding a very fast training process; however, controlling sparsity (a proxy for interpretability) is challenging.In recent studies, optimal decision trees, where all decisions are optimized simultaneously, have shown a better learning performance, especially when oblique cuts are implemented. In this paper, we propose a continuous optimization approach to build sparse optimal classification trees, based on oblique cuts, with the aim of using fewer predictor variables in the cuts as well as along the whole tree. Both types of sparsity, namely local and global, are modeled by means of regularizations with polyhedral norms. The computational experience reported supports the usefulness of our methodology. In all our data sets, local and global sparsity can be improved without harming classification accuracy. Unlike greedy approaches, our ability to easily trade in some of our classification accuracy for a gain in global sparsity is shown.

show abstract

Section: Introductionmentioning

confidence: 99%

“…The S-ORCT smooth formulation (9)- (16) has been implemented using Pyomo optimization modeling language [19,20] in Python 3.5 [31]. As solver, we have used IPOPT 3.11.1 [39], and have followed a multistart approach, where the process is repeated 20 times starting from different random initial solutions.…”

Section: Introductionmentioning

confidence: 99%

Sparsity in optimal randomized classification trees

Blanquero

Carrizosa

Molero-Río

et al. 2020

European Journal of Operational Research

View full text Add to dashboard Cite

show abstract

“…Addressing the leading challenges of statistical science, it serves to broaden not only the algorithmic but also the theoretical perspective [29,30,40]. It always involves massive data, including data streams and data heterogeneity.…”

Section: Big Data For M2m Networkmentioning

confidence: 99%

“…In addition, since its major key features are multiple sources, huge volume, and fast-changing nature, it is difficult for commonly used traditional computing methods such as machine learning, information retrieval and data mining to efficiently support the processing, analysis and computation of Big Data [40]. Therefore, in recent years, statistical methods such as clustering methods, linear regression models and bootstrapping schemes have been adapted to process Big Data [29]. Generally, Big Data can be classified based on the five main aspects of data and its content: source, store, format, staging, and processing [11,39].…”

Section: Big Data For M2m Networkmentioning

confidence: 99%

Big Data Analysis for M2M Networks:Research Challenges and Open Research Issues

Tuna¹,

Daş²,

Ramakrishnan³

et al. 2017

ijcna

View full text Add to dashboard Cite

-In recent years, solutions based on machine-tomachine (M2M) communications have started to support us in many areas of our life and work. However, the amount of data collected by M2M has increased tremendously and surpassed our expectations. This makes it necessary to investigate data mining methodologies and machine learning techniques in order to efficiently utilize large amounts of data gathered by M2M devices. In this paper, we first review existing data mining and machinelearning techniques specifically designed and proposed for M2M networks. Then, we discuss Big Data concept, investigate Big Data analysis techniques, and the importance of Big Data for M2M networks. Finally, we investigate research challenges and open research issues in M2M to provide an insight into future research opportunities.

show abstract

“…Random forest classification algorithm [10] is proposed by Leo Breiman and Adele Cutler et al in the 1980s. Its main idea is integration thought, which is based on the development of decision trees, and includes multiple decision trees.…”

Section: Random Forestmentioning

confidence: 99%

A Survey on Incremental Learning

Zhong¹,

Liu²,

Zeng³

et al. 2017

2017 5th International Conference on Computer, Automation and Power Electronics (CAPE 2017)

View full text Add to dashboard Cite

Abstract:Incremental learning is one of the research hotspots in machine learning. In this paper, we view the complex changes of data as three changes that are the change of sample, the change of class and the change of feature, and analyze the popular machine learning classification algorithms which support incremental learning. And then we focus on reviewing the research of three types of incremental learning: Sample Incremental Learning, Class Incremental Learning and Feature Incremental Learning. Finally, we make a prospect on the focus and difficulty of future research of incremental learning.

show abstract

Random Forests for Big Data

Cited by 269 publications

References 35 publications

Sparsity in optimal randomized classification trees

Sparsity in optimal randomized classification trees

Big Data Analysis for M2M Networks:Research Challenges and Open Research Issues

A Survey on Incremental Learning

Contact Info

Product

Resources

About