The Random Forest (RF) algorithm by Leo Breiman has become a standard data analysis tool in bioinformatics. It has shown excellent performance in settings where the number of variables is much larger than the number of observations, can cope with complex interaction structures as well as highly correlated variables and returns measures of variable importance. This paper synthesizes ten years of RF development with emphasis on applications to bioinformatics and computational biology. Special attention is given to practical aspects such as the selection of parameters, available RF implementations, and important pitfalls and biases of RF and its variable importance measures (VIMs). The paper surveys recent developments of the methodology relevant to bioinformatics as well as some representative examples of RF applications in this context and possible directions for future research.
Norman Matloff has written the book "Statistical Regression and Classification", which aims to present the path from linear regression models to machine learning methods. At the beginning of the book, the reader receives an overview of the basics of linear models as well as their limitations and assumptions. For me, the book was very helpful in reminding me of all these basics and then going on to linear models. However, Matloff does not stop at the parametric models. He also shows the differences between parametric and nonparametric models. Each chapter ends with a section of data, code, and math problems. Before I go into detail on the single chapters, I will answer the two most common questions brought up by a book review.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.