2017
DOI: 10.1186/s13040-017-0155-3
|View full text |Cite
|
Sign up to set email alerts
|

Ten quick tips for machine learning in computational biology

Abstract: Machine learning has become a pivotal tool for many projects in computational biology, bioinformatics, and health informatics. Nevertheless, beginners and biomedical researchers often do not have enough experience to run a data mining project effectively, and therefore can follow incorrect practices, that may lead to common mistakes or over-optimistic results. With this review, we present ten quick tips to take advantage of machine learning in any computational biology context, by avoiding some common errors t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
466
0
5

Year Published

2018
2018
2022
2022

Publication Types

Select...
7

Relationship

1
6

Authors

Journals

citations
Cited by 695 publications
(519 citation statements)
references
References 52 publications
1
466
0
5
Order By: Relevance
“…We measured the performance of the proposed methods with the area under the Precision-Recall (PR) curves (Davis & Goadrich, 2006;Chicco, 2017) and the logistic loss (also known as cross-entropy loss) function.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…We measured the performance of the proposed methods with the area under the Precision-Recall (PR) curves (Davis & Goadrich, 2006;Chicco, 2017) and the logistic loss (also known as cross-entropy loss) function.…”
Section: Resultsmentioning
confidence: 99%
“…High dimensional data can lead to several problems: in addition to high computational costs (in memory and time), it often leads to overfitting (Van Der Maaten, Postma & Van den Herik, 2009;Chicco, 2017;Moore, 2004). Dimensionality reduction can limit these problems and, additionally, can improve the visualization and interpretation of the dataset, because it allows researchers to focus on a reduced number of features.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…MCC is a correlation coefficient that describes the quality of a binary correlation between the observed and predicted classification, and is unaffected by large differences in population size . It can take a value between −1 and 1, where 0 means a completely random prediction, whereas −1 and 1 means a perfectly wrong and perfect prediction, respectively.…”
Section: Methodsmentioning
confidence: 99%
“…Grid search [57,58], Random search [59,60], Bayesian optimization [61][62][63][64], and Gradient-based optimization [65] are four existing methods of tuning parameters. In practice, Bayesian optimization has been shown to obtain better results in fewer evaluations compared to grid search and random search due to the ability to reason with respect to the quality of experiments before they are run [61][62][63][64].…”
Section: Parameter Tuningmentioning
confidence: 99%