Specifying, assessing, and selecting among candidate statistical models is fundamental to ecological research. Commonly used approaches to model selection are based on predictive scores and include information criteria such as Akaike's information criterion, and cross validation. Based on data splitting, cross validation is particularly versatile because it can be used even when it is not possible to derive a likelihood (e.g., many forms of machine learning) or count parameters precisely (e.g., mixed‐effects models). However, much of the literature on cross validation is technical and spread across statistical journals, making it difficult for ecological analysts to assess and choose among the wide range of options. Here we provide a comprehensive, accessible review that explains important—but often overlooked—technical aspects of cross validation for model selection, such as: bias correction, estimation uncertainty, choice of scores, and selection rules to mitigate overfitting. We synthesize the relevant statistical advances to make recommendations for the choice of cross‐validation technique and we present two ecological case studies to illustrate their application. In most instances, we recommend using exact or approximate leave‐one‐out cross validation to minimize bias, or otherwise k‐fold with bias correction if k < 10. To mitigate overfitting when using cross validation, we recommend calibrated selection via our recently introduced modified one‐standard‐error rule. We advocate for the use of predictive scores in model selection across a range of typical modeling goals, such as exploration, hypothesis testing, and prediction, provided that models are specified in accordance with the stated goal. We also emphasize, as others have done, that inference on parameter estimates is biased if preceded by model selection and instead requires a carefully specified single model or further technical adjustments.
Like many other Australian mammals, eastern quolls (Dasyurus viverrinus) were widespread in the south-east of mainland Australia but went extinct there during the 20th century. The species remained abundant in Tasmania until it rapidly declined from 2001 to 2003, coinciding with a period of unsuitable weather. We provide an updated analysis of eastern quoll population trends in Tasmania using a time series of annual spotlight counts (1985–2019) collected across most of the species’ range. Eastern quolls were widespread and abundant in Tasmania until the early 2000s. In addition to the previously documented severe decline in the early 2000s in the east and northeast, we present new evidence of an earlier decline in the north (mid-1990s) and a more recent decline in the south (~2009). Declines have continued unabated during the last decade, resulting in a ~67% decline since the late 1990s in the area with high quoll abundance. Although the major decline in the early 2000s coincided with unfavourable weather, the continuing and more recent declines suggest other undetermined causes are also involved. We can no longer assume the presence of eastern quolls in Tasmania ensures the species’ long-term survival, highlighting the urgent need to conserve the remaining populations in Tasmania.
The growing use of model-selection principles in ecology for statistical inference is underpinned by information criteria (IC) and cross-validation (CV) techniques. Although IC techniques, such as Akaike's Information Criterion, have been historically more popular in ecology, CV is a versatile and increasingly used alternative. CV uses data splitting to estimate model scores based on (out-of-sample) predictive performance, which can be used even when it is not possible to derive a likelihood (e.g., machine learning) or count parameters precisely (e.g., mixed-effects models and penalised regression). Here we provide a primer to understanding and applying CV in ecology. We review commonly applied variants of CV, including approximate methods, and make recommendations for their use based on the statistical context. We explain some important-but often overlooked-technical aspects of CV, such as bias correction, estimation uncertainty, score selection, and parsimonious selection rules. We also address misconceptions (and truths) about impediments to the use of CV, including computational cost and ease of implementation, and clarify the relationship between CV and information-theoretic approaches to model selection. The paper includes two ecological case studies-both from modelling contexts wherein traditional IC are difficult to apply-which illustrate the application of the techniques. The first is a classification task based on either parametric or machine-learning models; the second is a non-linear hierarchical regression problem focused on selecting allometric growth models. We conclude that CV-based model selection should be widely applied to ecological analyses, because of its robust estimation properties and the broad range of situations for which it is applicable. In particular, we recommend using leave-one-out (LOO) or approximate LOO CV to minimise bias, or otherwise K-fold CV using bias correction if K < 10. To mitigate overfitting, we recommend calibrated selection via the modified one-standard-error rule which accounts for the predominant cause of overfitting: score-estimation uncertainty.
Like many other Australian mammals, the eastern quoll (Dasyurus viverrinus) was widespread on the Australian mainland but went extinct there during the 20th century. The species remained abundant in Tasmania until a rapid decline occurred from 2001 to 2003, coinciding with a period of unsuitable weather. We provide an updated analysis of eastern quoll population trends in Tasmania by analysing a Tasmania-wide time series of annual spotlight counts (1985-2019). Eastern quolls were widespread and abundant in Tasmania until the early 2000s. A distinct change occurred in the early 2000s in the east and northeast, which led to severe population reductions. However, we present new evidence of an earlier decline in the north (mid-1990s) and a more recent decline around 2009 in the south. Range-wide declines have continued unabated during the last decade, resulting in a ~67% decline (since the late 1990s) in the area with high quoll abundance. Although the timing of the major decline in the early 2000s coincided with unfavourable weather, the continuing decline and more recent change points suggest other causes are also involved. We can no longer assume that the existence of eastern quolls in Tasmania ensures the species' long-term survival, highlighting the urgent need to increase efforts to conserve the remaining populations in Tasmania.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.