Abstract:Large genomic studies are becoming increasingly common with advances in sequencing technology, and our ability to understand how genomic variation influences phenotypic variation between individuals has never been greater. The exploration of such relationships first requires the identification of associations between molecular markers and phenotypes. Here, we explore the use of Random Forest (RF), a powerful machine-learning algorithm, in genomic studies to discern loci underlying both discrete and quantitativ… Show more
“…We assessed the extent to which functional traits predict the tolerance of species to urbanisation with a Random Forests (RF) approach using the package randomForest (Liaw & Wiener 2002). RF is a machine‐learning algorithm that can efficiently analyse many predictors simultaneously and account for interactions (Brieuc et al 2018). In addition, we also modelled presence/absence of species in the intensively urbanised environment and, if present, their relative abundance in the assemblage.…”
Section: Methodsmentioning
confidence: 99%
“…We assessed the predictive power of this model by estimating the misclassification of out‐of‐bag samples (error rate) when using the model (OOB‐ER). Following the randomForest protocol suggested by Brieuc et al (2018), we first optimized the mtry parameter (number of predictors to be randomly sampled at each node in a tree). We then used the optima of each metric to run 2000 trees twice, and compared the stability of the results (correlation > 0.97 in all cases).…”
Urbanisation is driving rapid declines in species richness and abundance worldwide, but the general implications for ecosystem function and services remain poorly understood. Here, we integrate global data on bird communities with comprehensive information on traits associated with ecological processes to show that assemblages in highly urbanised environments have substantially different functional composition and 20% less functional diversity on average than surrounding natural habitats. These changes occur without significant decreases in functional dissimilarity between species; instead, they are caused by a decrease in species richness and abundance evenness, leading to declines in functional redundancy. The reconfiguration and decline of native functional diversity in cities are not compensated by the presence of exotic species but are less severe under moderate levels of urbanisation. Thus, urbanisation has substantial negative impacts on functional diversity, potentially resulting in impaired provision of ecosystem services, but these impacts can be reduced by less intensive urbanisation practices.Ecology Letters (2020) 23: 962-972
“…We assessed the extent to which functional traits predict the tolerance of species to urbanisation with a Random Forests (RF) approach using the package randomForest (Liaw & Wiener 2002). RF is a machine‐learning algorithm that can efficiently analyse many predictors simultaneously and account for interactions (Brieuc et al 2018). In addition, we also modelled presence/absence of species in the intensively urbanised environment and, if present, their relative abundance in the assemblage.…”
Section: Methodsmentioning
confidence: 99%
“…We assessed the predictive power of this model by estimating the misclassification of out‐of‐bag samples (error rate) when using the model (OOB‐ER). Following the randomForest protocol suggested by Brieuc et al (2018), we first optimized the mtry parameter (number of predictors to be randomly sampled at each node in a tree). We then used the optima of each metric to run 2000 trees twice, and compared the stability of the results (correlation > 0.97 in all cases).…”
Urbanisation is driving rapid declines in species richness and abundance worldwide, but the general implications for ecosystem function and services remain poorly understood. Here, we integrate global data on bird communities with comprehensive information on traits associated with ecological processes to show that assemblages in highly urbanised environments have substantially different functional composition and 20% less functional diversity on average than surrounding natural habitats. These changes occur without significant decreases in functional dissimilarity between species; instead, they are caused by a decrease in species richness and abundance evenness, leading to declines in functional redundancy. The reconfiguration and decline of native functional diversity in cities are not compensated by the presence of exotic species but are less severe under moderate levels of urbanisation. Thus, urbanisation has substantial negative impacts on functional diversity, potentially resulting in impaired provision of ecosystem services, but these impacts can be reduced by less intensive urbanisation practices.Ecology Letters (2020) 23: 962-972
“…(2) number of variables to be randomly sampled at each node in a tree (Mtry), used to search for the variable that best partitions samples in the training data set and the default number is 1/3 of input variables [73]; and, (3) the minimum number of terminal nodes (Nodesize) where the default value is 5 in regression analysis [8,49].…”
Section: Random Forest and Agc Estimationmentioning
Dynamic monitoring of carbon storage in forests resources is important for tracking ecosystem functionalities and climate change impacts. In this study, we used multi-year Landsat data combined with a Random Forest (RF) algorithm to estimate the forest aboveground carbon (AGC) in a forest area in China (Hang-Jia-Hu) and analyzed its spatiotemporal changes during the past two decades. Maximum likelihood classification was applied to make land-use maps. Remote sensing variables, such as the spectral band, vegetation indices, and derived texture features, were extracted from 20 Landsat TM and OLI images over five different years (2000, 2004, 2010, 2015, and 2018). These variables were subsequently selected according to their importance and subsequently used in the RF algorithm to build an estimation model of forest AGC. The results showed the following: (1) Verification of classification results showed maximum likelihood can extract land information effectively. Our land cover classification yielded overall accuracies between 86.86% and 89.47%. (2) Additionally, our RF models showed good performance in predicting forest AGC, with R2 from 0.65 to 0.73 in the training and testing phase and a RMSE range between 3.18 and 6.66 Mg/ha. RMSEr in the testing phase ranged from 20.27 to 22.27 with a low model error. (3) The estimation results indicated that forest AGC in the past two decades increased with density at 10.14 Mg/ha, 21.63 Mg/ha, 26.39 Mg/ha, 29.25 Mg/ha, and 44.59 Mg/ha in 2000, 2004, 2010, 2015, and 2018. The total forest AGC storage had a growth rate of 285%. (4) Our study showed that, although forest area decreased in the study area during the time period under study, the total forest AGC increased due to an increment in forest AGC density. However, such an effect is overridden in the vicinity of cities by intense urbanization and the loss of forest covers. Our study demonstrated that the combined use of remote sensing data and machine learning techniques can improve our ability to track the forest changes in support of regional natural resource management practices.
“…GenABEL was used to perform association tests to identify SNPs potentially associated with dolphin susceptibility and resistance to Random Forest is a tree-based ensemble machine-learning tool, which is highly data adaptive, making it very useful for analysing genomic data (Chen & Ishwaran, 2012). This algorithm is particularly suited for detecting (with a high prediction accuracy) contigs that best explain variation in a response variable (Brieuc, Waters, Drinan, & Naish, 2018), and therefore loci under selection, for data sets with many thousands of SNPs and a relatively small number of samples (Chen & Ishwaran, 2012 to calculate the out-of-bag (OOB) error rate. In each RF, 125,000 trees were generated; with between two and six randomly chosen SNPs considered in each tree split (mtry; Supporting Information Table S3).…”
Section: Genome-wide Association Analyses and Random Forestsmentioning
Infectious diseases are significant demographic and evolutionary drivers of populations, but studies about the genetic basis of disease resistance and susceptibility are scarce in wildlife populations. Cetacean morbillivirus (CeMV) is a highly contagious disease that is increasing in both geographic distribution and incidence, causing unusual mortality events (UME) and killing tens of thousands of individuals across multiple cetacean species worldwide since the late 1980s. The largest CeMV outbreak in the Southern Hemisphere reported to date occurred in Australia in 2013, where it was a major factor in a UME, killing mainly young Indo‐Pacific bottlenose dolphins (
Tursiops aduncus
). Using
cases
(nonsurvivors) and
controls
(putative survivors) from the most affected population, we carried out a genome‐wide association study to identify candidate genes for resistance and susceptibility to CeMV. The genomic data set consisted of 278,147,988 sequence reads and 35,493 high‐quality SNPs genotyped across 38 individuals. Association analyses found highly significant differences in allele and genotype frequencies among
cases
and
controls
at 65 SNPs, and Random Forests conservatively identified eight as candidates. Annotation of these SNPs identified five candidate genes (
MAPK8
,
FBXW11
,
INADL
,
ANK3
and
ACOX3
) with functions associated with stress, pain and immune responses. Our findings provide the first insights into the genetic basis of host defence to this highly contagious disease, enabling the development of an applied evolutionary framework to monitor CeMV resistance across cetacean species. Biomarkers could now be established to assess potential risk factors associated with these genes in other CeMV‐affected cetacean populations and species. These results could also possibly aid in the advancement of vaccines against morbilliviruses.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.