Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science

Olson, Randal S.; Bartley, Nathan; Urbanowicz, Ryan J.; Moore, Jason H.

doi:10.1145/2908812.2908918

Cited by 434 publications

(389 citation statements)

References 16 publications

Supporting

Mentioning

335

Contrasting

Unclassified

Order By: Relevance

“…For a more nuanced approach, the similarity of the dataset on which ML is to be applied to datasets in PMLB could be quantified, and the set of algorithms that performed best on those similar datasets could be used. In lieu of detailed problem information, one could also use automated ML tools 16,17 and AI-driven ML platforms 18 to perform model selection and parameter tuning automatically.…”

Section: Discussionmentioning

confidence: 99%

Data-driven advice for applying machine learning to bioinformatics problems

et al. 2017

Self Cite

View full text Add to dashboard Cite

and jhmoore@upenn.eduAs the bioinformatics field grows, it must keep pace not only with new data but with new algorithms. Here we contribute a thorough analysis of 13 state-of-the-art, commonly used machine learning algorithms on a set of 165 publicly available classification problems in order to provide data-driven algorithm recommendations to current researchers. We present a number of statistical and visual comparisons of algorithm performance and quantify the effect of model selection and algorithm tuning for each algorithm and dataset. The analysis culminates in the recommendation of five algorithms with hyperparameters that maximize classifier performance across the tested problems, as well as general guidelines for applying machine learning to supervised classification problems.

show abstract

Section: Discussionmentioning

confidence: 99%

Data-driven advice for applying machine learning to bioinformatics problems

et al. 2017

Self Cite

View full text Add to dashboard Cite

show abstract

“…Hence, we must turn our attention to the design of complete pipelines of algorithms to solve problems. Some systems, such as Auto-WEKA (Kotthoff et al 2017), auto-sklearn (Feurer et al 2015), and TPOT (Olson and Moore 2016), go some way towards this goal, as they are capable of returning certain types of pipelines. Several relevant workshops have been organized at various ML conferences, including ICML 2017, ECML/PKDD 2017, and ICDM 2017 (e.g., see Brazdil et al (2017)).…”

Section: Resultsmentioning

confidence: 99%

Metalearning and Algorithm Selection: progress, state of the art and introduction to the 2018 Special Issue

Brazdil

Giraud‐Carrier

2017

Mach Learn

View full text Add to dashboard Cite

This article serves as an introduction to the Special Issue on Metalearning and Algorithm Selection. The introduction is divided into two parts. In the the first section, we give an overview of how the field of metalearning has evolved in the last 1-2 decades and mention how some of the papers in this special issue fit in. In the second section, we discuss the contents of this special issue. We divide the papers into thematic subgroups, provide information about each subgroup, as well as about the individual papers. Our main aim is to highlight how the papers selected for this special issue contribute to the field of metalearning.

show abstract

“…We used two automated machine learning (AutoML) methods, TPOT (Olson et al, 2016;Olson & Moore, 2019) and AutoSklearn (Feurer et al, 2019) that are based on the popular Python machine learning toolbox, scikit-learn (Pedregosa et al 2011) to select optimal classification models. While other AutoML tools exist that may outperform the ones we chose (Truong et al, 2019), TPOT and AutoSklearn are both free open-source, and easy to use, making them accessible for labs to incorporate into their existing analysis pipelines.…”

Section: Model Optimization and Selectionmentioning

confidence: 99%

“…The key advantage of AutoML tools such as TPOT and AutoSklearn is that they do the extensive work of finding the best type(s) of data transformation and models to build a pipeline for classifying 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 the input data, as well as the hyperparameters associated with these steps. TPOT is a tree-based optimization tool that builds and optimizes machine learning pipelines using genetic programming (Olson et al, 2016;Olson & Moore, 2019). TPOT generates pipelines of pre-processing steps and classification models in order to maximize classification performance while prioritizing simpler pipelines.…”

Section: Model Optimization and Selectionmentioning

confidence: 99%

Automated curation of CNMF-E-extracted ROI spatial footprints and calcium traces using open-source AutoML tools

Tran

Mocle

Ramsaran

et al. 2020

Preprint

View full text Add to dashboard Cite

In vivo 1-photon calcium imaging is an increasingly prevalent method in behavioural neuroscience. Numerous analysis pipelines have been developed to improve the reliability and scalability of preprocessing and ROI extraction for these large calcium imaging datasets. Despite these advancements in pre-processing methods, manual curation of the extracted spatial footprints and calcium traces of neurons remains important for quality control. Here, we propose an additional semi-automated curation step for sorting spatial footprints and calcium traces from putative neurons extracted using the popular CNMF-E algorithm. We used the automated machine learning tools TPOT and AutoSklearn to generate classifiers to curate the extracted ROIs trained on a subset of human-labeled data. AutoSklearn produced the best performing classifier, achieving an F1 score > 92% on the ground truth test dataset. This automated approach is a useful strategy for filtering ROIs with relatively few labeled data points, and can be easily added to pre-existing pipelines currently using CNMF-E for ROI extraction.

show abstract

Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science

Cited by 434 publications

References 16 publications

Data-driven advice for applying machine learning to bioinformatics problems

Data-driven advice for applying machine learning to bioinformatics problems

Metalearning and Algorithm Selection: progress, state of the art and introduction to the 2018 Special Issue

Automated curation of CNMF-E-extracted ROI spatial footprints and calcium traces using open-source AutoML tools

Contact Info

Product

Resources

About