Summary Recent studies have demonstrated a need for increased rigour in building and evaluating ecological niche models (ENMs) based on presence‐only occurrence data. Two major goals are to balance goodness‐of‐fit with model complexity (e.g. by ‘tuning’ model settings) and to evaluate models with spatially independent data. These issues are especially critical for data sets suffering from sampling bias, and for studies that require transferring models across space or time (e.g. responses to climate change or spread of invasive species). Efficient implementation of procedures to accomplish these goals, however, requires automation. We developed ENMeval, an R package that: (i) creates data sets for k‐fold cross‐validation using one of several methods for partitioning occurrence data (including options for spatially independent partitions), (ii) builds a series of candidate models using Maxent with a variety of user‐defined settings and (iii) provides multiple evaluation metrics to aid in selecting optimal model settings. The six methods for partitioning data are n−1 jackknife, random k‐folds ( = bins), user‐specified folds and three methods of masked geographically structured folds. ENMeval quantifies six evaluation metrics: the area under the curve of the receiver‐operating characteristic plot for test localities (AUCTEST), the difference between training and testing AUC (AUCDIFF), two different threshold‐based omission rates for test localities and the Akaike information criterion corrected for small sample sizes (AICc). We demonstrate ENMeval by tuning model settings for eight tree species of the genus Coccoloba in Puerto Rico based on AICc. Evaluation metrics varied substantially across model settings, and models selected with AICc differed from default ones. In summary, ENMeval facilitates the production of better ENMs and should promote future methodological research on many outstanding issues.
1. Quantitative evaluations to optimize complexity have become standard for avoiding overfitting of ecological niche models (ENMs) that estimate species' potential geographic distributions. ENMeval was the first R package to make such evaluations (often termed model tuning) widely accessible for the Maxent algorithm.
Abstract1. Scientific research increasingly calls for open-source software that is flexible, interactive, and expandable, while providing methodological guidance and reproducibility. Currently, many analyses in ecology are implemented with "black box" graphical user interfaces (GUIs) that lack flexibility or command-line interfaces that are infrequently used by non-specialists.2. To help remedy this situation in the context of species distribution modeling, we created Wallace, an open and modular application with a richly documented GUI with underlying R scripts that is flexible and highly interactive.3. Wallace guides users from acquiring and processing data to building models and examining predictions. Additionally, it is designed to grow via community contributions of new modules to expand functionality. All results are downloadable, along with code to reproduce the analysis. 4.Wallace provides an example of an innovative platform to increase access to cutting-edge methods and encourage plurality in science and collaboration in software development. K E Y W O R D Sbiogeography, range, reproducibility, software, spatial analysis, species distribution model
There is an urgent need for more ecologically realistic models for better predicting the effects of climate change on species' potential geographic distributions. Here we build ecological niche models using MAXENT and test whether selecting predictor variables based on biological knowledge and selecting ecologically realistic response curves can improve cross-time distributional predictions. We also evaluate how the method chosen for extrapolation into nonanalog conditions affects the prediction. We do so by estimating the potential distribution of a montane shrew (Mammalia, Soricidae, Cryptotis mexicanus) at present and the Last Glacial Maximum (LGM). Because it is tightly associated with cloud forests (with climatically determined upper and lower limits) whose distributional shifts are well characterized, this species provides clear expectations of plausible vs. implausible results. Response curves for the MAXENT model made using variables selected via biological justification were ecologically more realistic compared with those of the model made using many potential predictors. This strategy also led to much more plausible geographic predictions for upper and lower elevational limits of the species both for the present and during the LGM. By inspecting the modeled response curves, we also determined the most appropriate way to extrapolate into nonanalog environments, a previously overlooked factor in studies involving model transfer. This study provides intuitive context for recommendations that should promote more realistic ecological niche models for transfer across space and time.
Although long‐standing theory suggests that biotic variables are only relevant at local scales for explaining the patterns of species' distributions, recent studies have demonstrated improvements to species distribution models (SDMs) by incorporating predictor variables informed by biotic interactions. However, some key methodological questions remain, such as which kinds of interactions are permitted to include in these models, how to incorporate the effects of multiple interacting species, and how to account for interactions that may have a temporal dependence. We addressed these questions in an effort to model the distribution of the monarch butterfly Danaus plexippus during its fall migration (September–November) through Mexico, a region with new monitoring data and uncertain range limits even for this well‐studied insect. We estimated species richness of selected nectar plants (Asclepias spp.) and roosting trees (various highland species) for use as biotic variables in our models. To account for flowering phenology, we additionally estimated nectar plant richness of flowering species per month. We evaluated three types of models: climatic variables only (abiotic), plant richness estimates only (biotic) and combined (abiotic and biotic). We selected models with AICc and additionally determined if they performed better than random on spatially withheld data. We found that the combined models accounting for phenology performed best for all three months, and better than random for discriminatory ability but not omission rate. These combined models also produced the most ecologically realistic spatial patterns, but the modeled response for nectar plant richness matched ecological predictions for November only. These results represent the first model‐based monarch distributional estimates for the Mexican migration route and should provide foundations for future conservation work. More generally, the study demonstrates the potential benefits of using SDM‐derived richness estimates and phenological information for biotic factors affecting species distributions.
Species distribution models (SDMs) are widely used in ecology, biogeography and conservation biology to estimate relationships between environmental variables and species occurrence data and make predictions of how their distributions vary in space and time. During the past two decades, the field has increasingly made use of machine learning approaches for constructing and validating SDMs. Model accuracy has steadily increased as a result, but the interpretability of the fitted models, for example the relative importance of predictor variables or their causal effects on focal species, has not always kept pace. Here we draw attention to an emerging subdiscipline of artificial intelligence, explainable AI (xAI), as a toolbox for better interpreting SDMs. xAI aims at deciphering the behavior of complex statistical or machine learning models (e.g. neural networks, random forests, boosted regression trees), and can produce more transparent and understandable SDM predictions. We describe the rationale behind xAI and provide a list of tools that can be used to help ecological modelers better understand complex model behavior at different scales. As an example, we perform a reproducible SDM analysis in R on the African elephant and showcase some xAI tools such as local interpretable model‐agnostic explanation (LIME) to help interpret local‐scale behavior of the model. We conclude with what we see as the benefits and caveats of these techniques and advocate for their use to improve the interpretability of machine learning SDMs.
Aim Ecological niche modelling requires robust estimation of model performance and significance, but common evaluation approaches often yield biased estimates. Null models provide a solution but are rarely used in this field. We implemented an important modification to existing null model tests, evaluating null models with the same withheld records that were used to evaluate the real model. We built and evaluated models across a range of modelling scenarios and for various performance measures using the algorithm Maxent and the monk parakeet (Myiopsitta monachus). Location Native range in Southern America and global invasions predominantly in North/Central America and Europe. Methods We tested the ability of models built under 15 scenarios (five sets of calibration records and three settings that varied the level of model complexity) to predict spatially independent evaluation data in the invaded range (in effect, testing the models under spatial transfer). We quantified performance with measures of discriminatory ability and overfitting based on area under the receiver operating characteristic curve (AUC) and the omission error rate. We estimated null distributions of these measures and calculated effect size and significance. We determined how these estimates varied across modelling scenarios, comparing with two tests existing in the literature. Results Performance varied starkly across modelling scenarios. As expected, the measures of overfitting agreed with each other and provided different information than that of discriminatory ability. However, high performance per se did not show strong association with high effect size and significance. Main Conclusions Ecological niche models should be assessed with measures of effect size and significance based on appropriate null distributions, in contrast to several approaches existing in the literature. The proposed approach using independent evaluation data, implemented with our accompanying code and R package, allows such estimates for either the same or a different region/time period, and it merits use and continued development.
Aim The geographic range and ecological niche of species are widely used concepts in ecology, evolution and conservation and many modelling approaches have been developed to quantify each. Niche and distribution modelling methods require a litany of design choices; differences among subdisciplines have created communication barriers that increase isolation of scientific advances. As a result, understanding and reproducing the work of others is difficult, if not impossible. It is often challenging to evaluate whether a model has been built appropriately for its intended application or subsequent reuse. Here, we propose a standardized model metadata framework that enables researchers to understand and evaluate modelling decisions while making models fully citable and reproducible. Such reproducibility is critical for both scientific and policy reports, while international standardization enables better comparison between different scenarios and research groups. Innovation Range modelling metadata (RMMS) address three challenges: they (a) are designed for convenience to encourage use, (b) accommodate a wide variety of applications, and (c) are extensible to allow the research community to steer them as needed. RMMS are based on a metadata dictionary that specifies a hierarchical structure to catalogue different aspects of the range modelling process. The dictionary balances a constrained, minimalist vocabulary to improve standardization with flexibility for users to modify and extend. To facilitate use, we have developed an R package, rangeModelMetaData, to build templates, automatically fill values from common modelling objects, check for inconsistencies with standards, and suggest values. Main conclusions Range Modelling Metadata tools foster cross‐disciplinary advances in biogeography, conservation and allied disciplines by improving evaluation, model sharing, model searching, comparisons and reproducibility among studies. Our initially proposed standards here are designed to be modified and extended to evolve with research trends and needs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.