2017
DOI: 10.7717/peerj.2849
|View full text |Cite
|
Sign up to set email alerts
|

Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence

Abstract: Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution and, more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane (Grus monacha, n = 33), White-naped Crane (Grus vipio, n = 40), and Black-necked Crane (Grus nigricollis, n = 75) in China as three case studi… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
108
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
9

Relationship

2
7

Authors

Journals

citations
Cited by 206 publications
(122 citation statements)
references
References 49 publications
3
108
0
Order By: Relevance
“…Machine-learning methods, such as random forest (RF), are increasingly being applied to address ecological classification problems (Cutler, Edwards, Beard, Cutler, & Hess, 2007;Mi, Huttmann, Guo, Han, & Wen, 2017). RF is an ensemble method where a large number of individual decision tree models are induced by taking bootstrap samples of the data.…”
Section: Introductionmentioning
confidence: 99%
“…Machine-learning methods, such as random forest (RF), are increasingly being applied to address ecological classification problems (Cutler, Edwards, Beard, Cutler, & Hess, 2007;Mi, Huttmann, Guo, Han, & Wen, 2017). RF is an ensemble method where a large number of individual decision tree models are induced by taking bootstrap samples of the data.…”
Section: Introductionmentioning
confidence: 99%
“…Further, it features great tolerance to noise 24 , a strong immunity to overfitting 24 and high efficiency in processing a large number of predictors and their interactions 14, 25 . Owing to “recursive partitioning” and bagging, SPM algorithm optimization can properly handle interactions, stopping rules, weighting and complexities in predictor combinations 2427 . Two modifications were applied in the model settings: we used balanced class weights, a powerful and sophisticated weighting function in SPM, to defend against inequivalent prevalence (183 presence versus 18,300 absence points) 27 ; we set the number of trees to 1,000 to find the best possible model 27, 28 .…”
Section: Methodsmentioning
confidence: 99%
“…To answer the third research question, species distribution models (SDMs) were developed using MaxEnt (Phillips, Anderson, & Schapire, 2006) and Random Forest (Breiman, 2001a;Liaw & Wiener, 2002) algorithms, which are among the most commonly used machine learning methods (Aguirre-Gutiérrez et al, 2013;Mi, Huettmann, Guo, Han, & Wen, 2017). The models were fitted with binary response data (occurrence data with background data) in the R package sdm (Naimi & Araújo, 2016).…”
Section: Analysis Distribution Model Preparation and Validationmentioning
confidence: 99%
“…These problems are often tackled by making an ensemble of multiple models (Araújo & New, 2007;Regmi et al, 2018), but we did not do this here directly. Instead, we used one of the best algorithms in SDM (Aguirre-Gutiérrez et al, 2013;Craig & Huettmann, 2009;Mi et al, 2017), and, due to bagging, Random Forest being an ensemble model (Breiman, 2001a).…”
Section: Potential Distribution Of the Two Subspecies Under Projectmentioning
confidence: 99%