Nicole Augustin scite author profile

We argue that model selection uncertainty should be fully incorporated into statistical inference whenever estimation is sensitive to model choice and that choice is made with reference to the data. We consider different philosophies for achieving this goal and suggest strategies for data analysis. We illustrate our methods through three examples. The first is a Poisson regression of bird counts in which a choice is to be made between inclusion of one or both of two covariates. The second is a line transect data set for which different models yield substantially different estimates of abundance. The third is a simulated example in which truth is known.

show abstract

An Autologistic Model for the Spatial Distribution of Wildlife

Augustin¹,

Mugglestone²,

Buckland³

1996

The Journal of Applied Ecology

573

546

View full text Add to dashboard Cite

GAMs with integrated model selection using penalized regression splines and applications to environmental modelling

Wood

Augustin

2002

Ecological Modelling

656

513

View full text Add to dashboard Cite

Generalized Additive Models (GAMs) have been popularized by the work of Hastie and Tibshirani (1990) and the availability of user friendly GAM software in Splus. However, whilst it is flexible and efficient, the GAM framework based on backfitting with linear smoothers presents some difficulties when it comes to model selection and inference. On the other hand, the mathematically elegant work of Wahba (1990) and co-workers on Generalized Spline Smoothing (GSS) provides a rigorous framework for model selection (Gu and Wahba, 1991) and inference with GAMs constructed from smoothing splines: but unfortunately these models are computationally very expensive with operations counts that are of cubic order in the number of data. A "middle way" between these approaches is to construct GAMs using penalized regression splines (see e.g.

show abstract

Tumor Cell–Derived and Macrophage-Derived Cathepsin B Promotes Progression and Lung Metastasis of Mammary Cancer

Vasiljeva¹,

Papazoglou²,

Krüger

et al. 2006

334

265

View full text Add to dashboard Cite

Proteolysis in close vicinity of tumor cells is a hallmark of cancer invasion and metastasis. We show here that mouse mammary tumor virus-polyoma middle T antigen (PyMT) transgenic mice deficient for the cysteine protease cathepsin B (CTSB) exhibited a significantly delayed onset and reduced growth rate of mammary cancers compared with wild-type PyMT mice. with PyMT;ctsb +/+ cells, was used to address the role of stroma-derived CTSB in lung metastasis formation. Notably, ctsb À/À mice showed reduced number and volume of lung colonies, and infiltrating macrophages showed a strongly up-regulated expression of CTSB within metastatic cell populations. These results indicate that both cancer cellderived and stroma cell-derived (i.e., macrophages) CTSB plays an important role in tumor progression and metastasis.

show abstract

Generalized Additive Models for Gigadata: Modeling the U.K. Black Smoke Network Daily Data

Wood

Shaddick

et al. 2017

Journal of the American Statistical Association

147

132

View full text Add to dashboard Cite

We develop scalable methods for fitting penalized regression spline based generalized additive models with of the order of 10 4 coefficients to up to 10 8 data. Computational feasibility rests on: (i) a new iteration scheme for estimation of model coefficients and smoothing parameters, avoiding poorly scaling matrix operations; (ii) parallelization of the iteration's pivoted block Cholesky and basic matrix operations; (iii) the marginal discretization of model covariates to reduce memory footprint, with efficient scalable methods for computing required crossproducts directly from the discrete representation. Marginal discretization enables much finer discretization than joint discretization would permit. We were motivated by the need to model four decades worth of daily particulate data from the U.K. Black Smoke and Sulphur Dioxide Monitoring Network. Although reduced in size recently, over 2000 stations have at some time been part of the network, resulting in some 10 million measurements. Modeling at a daily scale is desirable for accurate trend estimation and mapping, and to provide daily exposure estimates for epidemiological cohort studies. Because of the dataset size, previous work has focused on modeling time or space averaged pollution levels, but this is unsatisfactory from a health perspective, since it is often acute exposure locally and on the time scale of days that is of most importance in driving adverse health outcomes. If computed by conventional means our black smoke model would require a half terabyte of storage just for the model matrix, whereas we are able to compute with it on a desktop workstation. The best previously available reduced memory footprint method would have required three orders of magnitude more computing time than our new method. Supplementary materials for this article are available online.

show abstract

Modeling Spatiotemporal Forest Health Monitoring Data

Augustin

Monachini

Wilpert

et al. 2009

Journal of the American Statistical Association

View full text Add to dashboard Cite

Forest health monitoring schemes were set up across Europe in the 1980's in re sponse to concern about air pollution related forest die back (Waldsterben) and have continued since then. Recent threats to forest health are climatic extremes likely to be due to global climate change, increased ground ozone levels and nitrogen deposi tion. We model yearly data on tree crown defoliation, an indicator of tree health, from a monitoring survey carried out in Baden-Württemberg, Germany since 1983. On a changing irregular grid, defoliation and other site specific variables are recorded. In Baden-Württemberg the temporal trend of defoliation differs between areas because of site characteristics and pollution levels, making it necessary to allow for space-time in teraction in the model. For this purpose we propose to use generalized additive mixed

show abstract

Spatial+: A novel approach to spatial confounding

2022

View full text Add to dashboard Cite

In spatial regression models, collinearity between covariates and spatial effects can lead to significant bias in effect estimates. This problem, known as spatial confounding, is encountered modeling forestry data to assess the effect of temperature on tree health. Reliable inference is difficult as results depend on whether or not spatial effects are included in the model. We propose a novel approach, spatial+, for dealing with spatial confounding when the covariate of interest is spatially dependent but not fully determined by spatial location. Using a thin plate spline model formulation we see that, in this case, the bias in covariate effect estimates is a direct result of spatial smoothing. Spatial+ reduces the sensitivity of the estimates to smoothing by replacing the covariates by their residuals after spatial dependence has been regressed away. Through asymptotic analysis we show that spatial+ avoids the bias problems of the spatial model. This is also demonstrated in a simulation study. Spatial+ is straightforward to implement using existing software and, as the response variable is the same as that of the spatial model, standard model selection criteria can be used for comparisons. A major advantage of the method is also that it extends to models with non‐Gaussian response distributions. Finally, while our results are derived in a thin plate spline setting, the spatial+ methodology transfers easily to other spatial model formulations.

show abstract

Exploring spatial vegetation dynamics using logistic regression and a multinomial logit model

Augustin

Cummins

French

2001

Journal of Applied Ecology

View full text Add to dashboard Cite

Summary1. This study presents statistical methodology that uses spatial explanatory variables to improve simpler estimates of transition probabilities from categorical data, such as vegetation type, that have been recorded as classified cells (pixels) in a grid or lattice at different times. 2. A specific application is to examine successions in semi-natural vegetation in north-east Scotland. Questions related to these data include: Do transition probabilities of a pixel depend on the size of a patch of vegetation (polygon) and pixel location within the polygon? Do stable areas remain stable? Does the proximity of certain vegetation types influence transitions? 3. We selected spatial variables that were likely to be important in this application, where short-range vegetative spread was thought to be an important factor. 4. The multinomial logit model is used to estimate the transition probabilities as a function of explanatory variables, including location, neighbourhood information and other factors recorded at the start of the transition period. This model allowed the testing of different assumptions about the dynamics of underlying processes leading to transitions. 5. When the number of categories, for example vegetation types, observed is large in comparison to the sample size, estimates of transition probabilities can be unreliable. We show that using change of category within the time period as the response in a logistic regression can still provide insight to the underlying dynamics of change in such a case. 6. The methods are illustrated with some Scottish vegetation classification data with pixels of size 5 × 5 m covering a square of area 0·25 km 2 . Two contrasting squares were investigated: the first was upland moorland grazed by sheep and the second was a lowland area with more varied vegetation and low intensity grazing by cattle. 7. In both squares there are strong spatial trends, and the neighbourhood of a pixel affected its transition. Prediction misclassification rates estimated from different models were compared using K-fold cross-validation. The multinomial model, including position in the square and number of neighbouring pixels in the same category as the pixel modelled, reduced the misclassification rate compared with the model without spatial explanatory variables. 8. The improved estimates of transition probabilities could be incorporated into Markov models used in simulation studies to predict future vegetation changes under different management strategies.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Nicole Augustin

Model Selection: An Integral Part of Inference

An Autologistic Model for the Spatial Distribution of Wildlife

GAMs with integrated model selection using penalized regression splines and applications to environmental modelling

Tumor Cell–Derived and Macrophage-Derived Cathepsin B Promotes Progression and Lung Metastasis of Mammary Cancer

Generalized Additive Models for Gigadata: Modeling the U.K. Black Smoke Network Daily Data

Modeling Spatiotemporal Forest Health Monitoring Data

Spatial+: A novel approach to spatial confounding

Exploring spatial vegetation dynamics using logistic regression and a multinomial logit model

Contact Info

Product

Resources

About