Proceedings of the Genetic and Evolutionary Computation Conference 2016 2016
DOI: 10.1145/2908812.2908918
|View full text |Cite
|
Sign up to set email alerts
|

Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science

Abstract: As the field of data science continues to grow, there will be an ever-increasing demand for tools that make machine learning accessible to non-experts. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learningpipeline design. We implement an open source Tree-based Pipeline Optimization Tool (TPOT) in Python and demonstrate its effectiveness on a series of simulated and real-world benchmark data sets. In particular, we show that … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
335
0
13

Year Published

2017
2017
2024
2024

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 434 publications
(389 citation statements)
references
References 16 publications
1
335
0
13
Order By: Relevance
“…For a more nuanced approach, the similarity of the dataset on which ML is to be applied to datasets in PMLB could be quantified, and the set of algorithms that performed best on those similar datasets could be used. In lieu of detailed problem information, one could also use automated ML tools 16,17 and AI-driven ML platforms 18 to perform model selection and parameter tuning automatically.…”
Section: Discussionmentioning
confidence: 99%
“…For a more nuanced approach, the similarity of the dataset on which ML is to be applied to datasets in PMLB could be quantified, and the set of algorithms that performed best on those similar datasets could be used. In lieu of detailed problem information, one could also use automated ML tools 16,17 and AI-driven ML platforms 18 to perform model selection and parameter tuning automatically.…”
Section: Discussionmentioning
confidence: 99%
“…Hence, we must turn our attention to the design of complete pipelines of algorithms to solve problems. Some systems, such as Auto-WEKA (Kotthoff et al 2017), auto-sklearn (Feurer et al 2015), and TPOT (Olson and Moore 2016), go some way towards this goal, as they are capable of returning certain types of pipelines. Several relevant workshops have been organized at various ML conferences, including ICML 2017, ECML/PKDD 2017, and ICDM 2017 (e.g., see Brazdil et al (2017)).…”
Section: Resultsmentioning
confidence: 99%
“…We used two automated machine learning (AutoML) methods, TPOT (Olson et al, 2016;Olson & Moore, 2019) and AutoSklearn (Feurer et al, 2019) that are based on the popular Python machine learning toolbox, scikit-learn (Pedregosa et al 2011) to select optimal classification models. While other AutoML tools exist that may outperform the ones we chose (Truong et al, 2019), TPOT and AutoSklearn are both free open-source, and easy to use, making them accessible for labs to incorporate into their existing analysis pipelines.…”
Section: Model Optimization and Selectionmentioning
confidence: 99%
“…The key advantage of AutoML tools such as TPOT and AutoSklearn is that they do the extensive work of finding the best type(s) of data transformation and models to build a pipeline for classifying 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 the input data, as well as the hyperparameters associated with these steps. TPOT is a tree-based optimization tool that builds and optimizes machine learning pipelines using genetic programming (Olson et al, 2016;Olson & Moore, 2019). TPOT generates pipelines of pre-processing steps and classification models in order to maximize classification performance while prioritizing simpler pipelines.…”
Section: Model Optimization and Selectionmentioning
confidence: 99%