TPOT: A Tree-Based Pipeline Optimization Tool for Automating Machine Learning

Olson, Randal S.; Moore, Jason H.

doi:10.1007/978-3-030-05318-5_8

Cited by 355 publications

(380 citation statements)

References 16 publications

Supporting

Mentioning

334

Contrasting

Unclassified

Order By: Relevance

“…We used two automated machine learning (AutoML) methods, TPOT (Olson et al, 2016;Olson & Moore, 2019) and AutoSklearn (Feurer et al, 2019) that are based on the popular Python machine learning toolbox, scikit-learn (Pedregosa et al 2011) to select optimal classification models. While other AutoML tools exist that may outperform the ones we chose (Truong et al, 2019), TPOT and AutoSklearn are both free open-source, and easy to use, making them accessible for labs to incorporate into their existing analysis pipelines.…”

Section: Model Optimization and Selectionmentioning

confidence: 99%

“…The key advantage of AutoML tools such as TPOT and AutoSklearn is that they do the extensive work of finding the best type(s) of data transformation and models to build a pipeline for classifying 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 the input data, as well as the hyperparameters associated with these steps. TPOT is a tree-based optimization tool that builds and optimizes machine learning pipelines using genetic programming (Olson et al, 2016;Olson & Moore, 2019). TPOT generates pipelines of pre-processing steps and classification models in order to maximize classification performance while prioritizing simpler pipelines.…”

Section: Model Optimization and Selectionmentioning

confidence: 99%

See 1 more Smart Citation

Automated curation of CNMF-E-extracted ROI spatial footprints and calcium traces using open-source AutoML tools

Tran

Mocle

Ramsaran

et al. 2020

Preprint

View full text Add to dashboard Cite

In vivo 1-photon calcium imaging is an increasingly prevalent method in behavioural neuroscience. Numerous analysis pipelines have been developed to improve the reliability and scalability of preprocessing and ROI extraction for these large calcium imaging datasets. Despite these advancements in pre-processing methods, manual curation of the extracted spatial footprints and calcium traces of neurons remains important for quality control. Here, we propose an additional semi-automated curation step for sorting spatial footprints and calcium traces from putative neurons extracted using the popular CNMF-E algorithm. We used the automated machine learning tools TPOT and AutoSklearn to generate classifiers to curate the extracted ROIs trained on a subset of human-labeled data. AutoSklearn produced the best performing classifier, achieving an F1 score > 92% on the ground truth test dataset. This automated approach is a useful strategy for filtering ROIs with relatively few labeled data points, and can be easily added to pre-existing pipelines currently using CNMF-E for ROI extraction.

show abstract

Section: Model Optimization and Selectionmentioning

confidence: 99%

Section: Model Optimization and Selectionmentioning

confidence: 99%

Automated curation of CNMF-E-extracted ROI spatial footprints and calcium traces using open-source AutoML tools

Tran

Mocle

Ramsaran

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…After this procedure, the sets used for training consisted of 3383/965192 positive/negative pairs, and the test set contained 2227/426398 positive/negative pairs. We used TPOT (Olson and Moore, 2016) , an automated machine learning algorithm, to guide the machine learning process. TPOT is a genetic algorithm that searches over the space of scikit-learn classifiers, hyperparameters thereof, and pre-processors using cross-validation.…”

Section: Machine Learningmentioning

confidence: 99%

“…This combination of feature extraction methods created 52 starting features for the classifier. We then used automated machine learning methods (auto-ML), as implemented in the Python package TPOT (Olson and Moore, 2016) , to calculate protein-protein interaction scores from these features by training it on PPIs from gold-standard databases. Briefly, TPOT uses a genetic algorithm to choose among classifier models and pre-processors by testing each in a cross-validation framework on the training set, and allowing the population of pipelines to "evolve" based on the "fitness" derived from the cross-validation scores.…”

Section: Machine Learning Helps Determine a High-quality Protein Intementioning

confidence: 99%

Mapping Functional Protein Neighborhoods in the Mouse Brain

Liebeskind

Young

Halling

et al. 2020

Preprint

View full text Add to dashboard Cite

New proteomics methods make it possible to determine protein interaction maps at the proteome scale without the need for genetically encoded tags, opening up new organisms and tissue types to investigation. Current molecular and computational methods are oriented towards protein complexes that are soluble, stable, and discrete. However, the mammalian brain, among the most complicated and most heavily studied tissue types, derives many of its unique functions from protein interactions that are neither discrete nor soluble. Proteomics investigations into the global protein interaction landscape of the brain have therefore leveraged non-proteomics datasets to supplement their experiments. Here, we develop a novel, integrative proteomics pipeline and apply it to infer a global map of functional protein neighborhoods in the mouse brain without the aid of external datasets. By leveraging synaptosome enrichment and interactomics methods that target both soluble and insoluble protein fractions, we resolved protein interactions for key neural pathways, including those from refractory subcellular fractions such as the membrane and cytoskeleton. In comparison to external datasets, our observed interactions perform similarly to hand-curated synaptic protein interactions while also suggesting thousands of novel connections. We additionally employed cleavable chemical cross-linkers to detect direct binding partners and provide structural context. Our combined map suggests new protein pathways and novel mechanisms for proteins that underlie neurological diseases, including autism and epilepsy. Our results show that proteomics methods alone are sufficient to determine global interaction maps for proteins that are of broad interest to neuroscience. We anticipate that our map will be used to prioritize new research avenues and will pave the way towards future proteomics techniques that resolve protein interactions at ever greater resolution.

show abstract

“…We irst used the scikit-learn ExtraTreesClassi ier feature selection to reduce the dimensionality of the feature matrix to the top 100 features based on declining feature importance ( Figure S3 ). We used the TPOT (Olson and Moore, 2016) AutoML wrapper of scikit-learn machine learning functions for all subsequent training steps. We discovered optimal hyperparameters for an ExtraTreesClassi ier with 5-fold cross-validation, with an area under the precision-recall curve of 0.64.…”

Section: Identi Ication Of Interacting Proteins By Supervised Machinementioning

confidence: 99%

A pan-plant protein complex map reveals deep conservation and novel assemblies

McWhite¹,

Papoulas²,

Drew³

et al. 2019

Preprint

View full text Add to dashboard Cite

Plants are foundational to global ecological and economic systems, yet most plant proteins remain uncharacterized. Protein interaction networks often suggest protein functions and open new avenues to characterize genes and proteins. We therefore systematically determined protein complexes from 13 plant species of scienti ic and agricultural importance, greatly expanding the known repertoire of stable protein complexes in plants. Using co-fractionation mass spectrometry, we recovered known complexes, con irmed complexes predicted to occur in plants, and identi ied novel interactions conserved over 1.1 billion years of green plant evolution. Several novel complexes are involved in vernalization and pathogen defense, traits critical to agriculture. We also uncovered plant analogs of animal complexes with distinct molecular assemblies, including a megadalton-scale tRNA multi-synthetase complex. The resulting map offers the irst cross-species view of conserved, stable protein assemblies shared across plant cells and provides a mechanistic, biochemical framework for interpreting plant genetics and mutant phenotypes.

show abstract

TPOT: A Tree-Based Pipeline Optimization Tool for Automating Machine Learning

Cited by 355 publications

References 16 publications

Automated curation of CNMF-E-extracted ROI spatial footprints and calcium traces using open-source AutoML tools

Automated curation of CNMF-E-extracted ROI spatial footprints and calcium traces using open-source AutoML tools

Mapping Functional Protein Neighborhoods in the Mouse Brain

A pan-plant protein complex map reveals deep conservation and novel assemblies

Contact Info

Product

Resources

About