The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2018
DOI: 10.1111/1755-0998.12773
|View full text |Cite
|
Sign up to set email alerts
|

A practical introduction to Random Forest for genetic association studies in ecology and evolution

Abstract: Large genomic studies are becoming increasingly common with advances in sequencing technology, and our ability to understand how genomic variation influences phenotypic variation between individuals has never been greater. The exploration of such relationships first requires the identification of associations between molecular markers and phenotypes. Here, we explore the use of Random Forest (RF), a powerful machine-learning algorithm, in genomic studies to discern loci underlying both discrete and quantitativ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
102
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 102 publications
(102 citation statements)
references
References 69 publications
(113 reference statements)
0
102
0
Order By: Relevance
“…We assessed the extent to which functional traits predict the tolerance of species to urbanisation with a Random Forests (RF) approach using the package randomForest (Liaw & Wiener 2002). RF is a machine‐learning algorithm that can efficiently analyse many predictors simultaneously and account for interactions (Brieuc et al 2018). In addition, we also modelled presence/absence of species in the intensively urbanised environment and, if present, their relative abundance in the assemblage.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We assessed the extent to which functional traits predict the tolerance of species to urbanisation with a Random Forests (RF) approach using the package randomForest (Liaw & Wiener 2002). RF is a machine‐learning algorithm that can efficiently analyse many predictors simultaneously and account for interactions (Brieuc et al 2018). In addition, we also modelled presence/absence of species in the intensively urbanised environment and, if present, their relative abundance in the assemblage.…”
Section: Methodsmentioning
confidence: 99%
“…We assessed the predictive power of this model by estimating the misclassification of out‐of‐bag samples (error rate) when using the model (OOB‐ER). Following the randomForest protocol suggested by Brieuc et al (2018), we first optimized the mtry parameter (number of predictors to be randomly sampled at each node in a tree). We then used the optima of each metric to run 2000 trees twice, and compared the stability of the results (correlation > 0.97 in all cases).…”
Section: Methodsmentioning
confidence: 99%
“…(2) number of variables to be randomly sampled at each node in a tree (Mtry), used to search for the variable that best partitions samples in the training data set and the default number is 1/3 of input variables [73]; and, (3) the minimum number of terminal nodes (Nodesize) where the default value is 5 in regression analysis [8,49].…”
Section: Random Forest and Agc Estimationmentioning
confidence: 99%
“…GenABEL was used to perform association tests to identify SNPs potentially associated with dolphin susceptibility and resistance to Random Forest is a tree-based ensemble machine-learning tool, which is highly data adaptive, making it very useful for analysing genomic data (Chen & Ishwaran, 2012). This algorithm is particularly suited for detecting (with a high prediction accuracy) contigs that best explain variation in a response variable (Brieuc, Waters, Drinan, & Naish, 2018), and therefore loci under selection, for data sets with many thousands of SNPs and a relatively small number of samples (Chen & Ishwaran, 2012 to calculate the out-of-bag (OOB) error rate. In each RF, 125,000 trees were generated; with between two and six randomly chosen SNPs considered in each tree split (mtry; Supporting Information Table S3).…”
Section: Genome-wide Association Analyses and Random Forestsmentioning
confidence: 99%