2016
DOI: 10.1007/978-3-319-31204-0_9
|View full text |Cite
|
Sign up to set email alerts
|

Automating Biomedical Data Science Through Tree-Based Pipeline Optimization

Abstract: Abstract. Over the past decade, data science and machine learning has grown from a mysterious art form to a staple tool across a variety of fields in academia, business, and government. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning-pipeline design. We implement a Tree-based Pipeline Optimization Tool (TPOT) and demonstrate its effectiveness on a series of simulated and real-world genetic data sets. In particular, we … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
232
0
4

Year Published

2017
2017
2023
2023

Publication Types

Select...
7
3

Relationship

1
9

Authors

Journals

citations
Cited by 257 publications
(236 citation statements)
references
References 16 publications
0
232
0
4
Order By: Relevance
“…We then treated all of the materials-based features listed in table 2 as target values to be predicted by surface texture fingerprints, to examine how this single descriptor can be used to model different bulk or surface features of Ag nanoparticles. All the regression models were built, tested and selected using genetic programming (TPOT [27,28]) without human interference, and the result is shown in figure 8. Here we can see that surface texture fingerprints predicts VA with a coefficient of determination R 2 of 94% on testing data set, but performs poorly on bulk features including FCC population, HCP population, ICO population.…”
Section: Classification and Regressionmentioning
confidence: 99%
“…We then treated all of the materials-based features listed in table 2 as target values to be predicted by surface texture fingerprints, to examine how this single descriptor can be used to model different bulk or surface features of Ag nanoparticles. All the regression models were built, tested and selected using genetic programming (TPOT [27,28]) without human interference, and the result is shown in figure 8. Here we can see that surface texture fingerprints predicts VA with a coefficient of determination R 2 of 94% on testing data set, but performs poorly on bulk features including FCC population, HCP population, ICO population.…”
Section: Classification and Regressionmentioning
confidence: 99%
“…Also, the maximum of feature selected was set to be 0.24 of all possible features. VC, DTC and GBC were all provided in the Python Scit-kit learn library [43] . Hyperparameters mentioned above were specified and the rest were left as default.…”
Section: Resultsmentioning
confidence: 99%
“…In the future, we will assess the true performance of the optimized system over the SemEval-2017 test set, via more thorough automated optimization. We will also compare the performance of our simple system with other similar automated systems (e.g., the TPOT system Olson et al (2016)) in terms of speed and performance.…”
Section: Results Comments and Conclusionmentioning
confidence: 99%