Andreas Gegg scite author profile

Abstract. Predictive spatial modelling is an important task in natural hazard assessment and regionalisation of geomorphic processes or landforms. Logistic regression is a multivariate statistical approach frequently used in predictive modelling; it can be conducted stepwise in order to select from a number of candidate independent variables those that lead to the best model. In our case study on a debris flow susceptibility model, we investigate the sensitivity of model selection and quality to different sample sizes in light of the following problem: on the one hand, a sample has to be large enough to cover the variability of geofactors within the study area, and to yield stable and reproducible results; on the other hand, the sample must not be too large, because a large sample is likely to violate the assumption of independent observations due to spatial autocorrelation. Using stepwise model selection with 1000 random samples for a number of sample sizes between n = 50 and n = 5000, we investigate the inclusion and exclusion of geofactors and the diversity of the resulting models as a function of sample size; the multiplicity of different models is assessed using numerical indices borrowed from information theory and biodiversity research. Model diversity decreases with increasing sample size and reaches either a local minimum or a plateau; even larger sample sizes do not further reduce it, and they approach the upper limit of sample size given, in this study, by the autocorrelation range of the spatial data sets. In this way, an optimised sample size can be derived from an exploratory analysis. Model uncertainty due to sampling and model selection, and its predictive ability, are explored statistically and spatially through the example of 100 models estimated in one study area and validated in a neighbouring area: depending on the study area and on sample size, the predicted probabilities for debris flow release differed, on average, by 7 to 23 percentage points. In view of these results, we argue that researchers applying model selection should explore the behaviour of the model selection for different sample sizes, and that consensus models created from a number of random samples should be given preference over models relying on a single sample.

show abstract

The Cameron–Martin Theorem for (p-)Slepian Processes

Bischoff

Gegg

2015

J Theor Probab

View full text Add to dashboard Cite

We show a Cameron-Martin theorem for Slepian processes Wt :and Bs is Brownian motion. More exactly, we determine the class of functions F for which a density of F (t) + Wt with respect to Wt exists. Moreover, we prove an explicit formula for this density. p-Slepian processes are closely related to Slepian processes. p-Slepian processes play a prominent role among others in scan statistics and in testing for parameter constancy when data are taken from a moving window.

show abstract

Sample size matters: investigating the effect of sample size on a logistic regression debris flow susceptibility model

Heckmann¹,

Gegg²,

Gegg³

et al. 2013

Preprint

View full text Add to dashboard Cite

Abstract. Predictive spatial modelling is an important task in natural hazard assessment and regionalisation of geomorphic processes or landforms. Logistic regression is a multivariate statistical approach frequently used in predictive modelling; it can be conducted stepwise in order to select from a number of candidate independent variables those that lead to the best model. In our case study on a debris flow susceptibility model, we investigate the sensitivity of model selection and quality to different sample sizes in light of the following problem: on the one hand, a sample has to be large enough to cover the variability of geofactors within the study area, and to yield stable results; on the other hand, the sample must not be too large, because a large sample is likely to violate the assumption of independent observations due to spatial autocorrelation. Using stepwise model selection with 1000 random samples for a number of sample sizes between n = 50 and n = 5000, we investigate the inclusion and exclusion of geofactors and the diversity of the resulting models as a function of sample size; the multiplicity of different models is assessed using numerical indices borrowed from information theory and biodiversity research. Model diversity decreases with increasing sample size and reaches either a local minimum or a plateau; even larger sample sizes do not further reduce it, and approach the upper limit of sample size given, in this study, by the autocorrelation range of the spatial datasets. In this way, an optimised sample size can be derived from an exploratory analysis. Model uncertainty due to sampling and model selection, and its predictive ability, are explored statistically and spatially through the example of 100 models estimated in one study area and validated in a neighbouring area: depending on the study area and on sample size, the predicted probabilities for debris flow release differed, on average, by 7 to 23 percentage points. In view of these results, we argue that researchers applying model selection should explore the behaviour of the model selection for different sample sizes, and that consensus models created from a number of random samples should be given preference over models relying on a single sample.

show abstract

Partial sum process to check regression models with multiple correlated response: With an application for testing a change-point in profile data

Bischoff

Gegg

2011

Journal of Multivariate Analysis

View full text Add to dashboard Cite

a b s t r a c tWe consider regression models with multiple correlated responses for each design point. Under the null hypothesis, a linear regression is assumed. For the least-squares residuals of this linear regression, we establish the limit of the partial sums. This limit is a projection on a certain subspace of the reproducing Kernel Hilbert space of a multivariate Brownian motion. Based on this limit, we propose a significance test of Kolmogorov-Smirnov type to test the null hypothesis and show that this result can be used to study a change-point problem in the case of linear profile data (panel data). We compare our proposed method, which does not rely on any distributional assumptions, with the likelihood ratio test in a simulation study.

show abstract

Boundary crossing probabilities for (q,d)-Slepian-processes

Bischoff

Gegg

2016

Statistics & Probability Letters

View full text Add to dashboard Cite

defined as centered, stationary Gaussian process with continuous sample paths and covariancewhere Bt is standard Brownian motion, is a (q, d)-Slepian-process. In this paper we prove an analytical formula for the boundary crossing probabilityaffine function. This formula can be used as approximation for the boundary crossing probability of an arbitrary boundary by approximating the boundary function by piecewise affine functions.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Andreas Gegg

Sample size matters: investigating the effect of sample size on a logistic regression susceptibility model for debris flows

The Cameron–Martin Theorem for (p-)Slepian Processes

Sample size matters: investigating the effect of sample size on a logistic regression debris flow susceptibility model

Partial sum process to check regression models with multiple correlated response: With an application for testing a change-point in profile data

Boundary crossing probabilities for (q,d)-Slepian-processes

Contact Info

Product

Resources

About