2019
DOI: 10.1007/978-3-030-05318-5_8
|View full text |Cite
|
Sign up to set email alerts
|

TPOT: A Tree-Based Pipeline Optimization Tool for Automating Machine Learning

Abstract: As data science becomes increasingly mainstream, there will be an ever-growing demand for data science tools that are more accessible, flexible, and scalable. In response to this demand, automated machine learning (AutoML) researchers have begun building systems that automate the process of designing and optimizing machine learning pipelines. In this chapter we present TPOT v0.3, an open source genetic programming-based AutoML system that optimizes a series of feature preprocessors and machine learning models … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
334
1
7

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 355 publications
(380 citation statements)
references
References 16 publications
1
334
1
7
Order By: Relevance
“…We used two automated machine learning (AutoML) methods, TPOT (Olson et al, 2016;Olson & Moore, 2019) and AutoSklearn (Feurer et al, 2019) that are based on the popular Python machine learning toolbox, scikit-learn (Pedregosa et al 2011) to select optimal classification models. While other AutoML tools exist that may outperform the ones we chose (Truong et al, 2019), TPOT and AutoSklearn are both free open-source, and easy to use, making them accessible for labs to incorporate into their existing analysis pipelines.…”
Section: Model Optimization and Selectionmentioning
confidence: 99%
See 1 more Smart Citation
“…We used two automated machine learning (AutoML) methods, TPOT (Olson et al, 2016;Olson & Moore, 2019) and AutoSklearn (Feurer et al, 2019) that are based on the popular Python machine learning toolbox, scikit-learn (Pedregosa et al 2011) to select optimal classification models. While other AutoML tools exist that may outperform the ones we chose (Truong et al, 2019), TPOT and AutoSklearn are both free open-source, and easy to use, making them accessible for labs to incorporate into their existing analysis pipelines.…”
Section: Model Optimization and Selectionmentioning
confidence: 99%
“…The key advantage of AutoML tools such as TPOT and AutoSklearn is that they do the extensive work of finding the best type(s) of data transformation and models to build a pipeline for classifying 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 the input data, as well as the hyperparameters associated with these steps. TPOT is a tree-based optimization tool that builds and optimizes machine learning pipelines using genetic programming (Olson et al, 2016;Olson & Moore, 2019). TPOT generates pipelines of pre-processing steps and classification models in order to maximize classification performance while prioritizing simpler pipelines.…”
Section: Model Optimization and Selectionmentioning
confidence: 99%
“…After this procedure, the sets used for training consisted of 3383/965192 positive/negative pairs, and the test set contained 2227/426398 positive/negative pairs. We used TPOT (Olson and Moore, 2016) , an automated machine learning algorithm, to guide the machine learning process. TPOT is a genetic algorithm that searches over the space of scikit-learn classifiers, hyperparameters thereof, and pre-processors using cross-validation.…”
Section: Machine Learningmentioning
confidence: 99%
“…This combination of feature extraction methods created 52 starting features for the classifier. We then used automated machine learning methods (auto-ML), as implemented in the Python package TPOT (Olson and Moore, 2016) , to calculate protein-protein interaction scores from these features by training it on PPIs from gold-standard databases. Briefly, TPOT uses a genetic algorithm to choose among classifier models and pre-processors by testing each in a cross-validation framework on the training set, and allowing the population of pipelines to "evolve" based on the "fitness" derived from the cross-validation scores.…”
Section: Machine Learning Helps Determine a High-quality Protein Intementioning
confidence: 99%
“…We irst used the scikit-learn ExtraTreesClassi ier feature selection to reduce the dimensionality of the feature matrix to the top 100 features based on declining feature importance ( Figure S3 ). We used the TPOT (Olson and Moore, 2016) AutoML wrapper of scikit-learn machine learning functions for all subsequent training steps. We discovered optimal hyperparameters for an ExtraTreesClassi ier with 5-fold cross-validation, with an area under the precision-recall curve of 0.64.…”
Section: Identi Ication Of Interacting Proteins By Supervised Machinementioning
confidence: 99%