Guenther Walther scite author profile

We propose a method (the`gap statistic') for estimating the number of clusters (groups) in a set of data. The technique uses the output of any clustering algorithm (e.g. K-means or hierarchical), comparing the change in within-cluster dispersion with that expected under an appropriate reference null distribution. Some theory is developed for the proposal and a simulation study shows that the gap statistic usually outperforms other methods that have been proposed in the literature.

show abstract

Cluster Validation by Prediction Strength

Tibshirani

Walther

2005

Journal of Computational and Graphical Statistics

511

488

View full text Add to dashboard Cite

Forward stagewise regression and the monotone lasso

Hastie¹,

Taylor²,

Tibshirani³

et al. 2007

Electron. J. Statist.

181

172

View full text Add to dashboard Cite

We consider the least angle regression and forward stagewise algorithms for solving penalized least squares regression problems. In Efron, Hastie, Johnstone & Tibshirani (2004) it is proved that the least angle regression algorithm, with a small modification, solves the lasso regression problem. Here we give an analogous result for incremental forward stagewise regression, showing that it solves a version of the lasso problem that enforces monotonicity. One consequence of this is as follows: while lasso makes optimal progress in terms of reducing the residual sum-of-squares per unit increase in $L_1$-norm of the coefficient $\beta$, forward stage-wise is optimal per unit $L_1$ arc-length traveled along the coefficient path. We also study a condition under which the coefficient paths of the lasso are monotone, and hence the different algorithms coincide. Finally, we compare the lasso and forward stagewise procedures in a simulation study involving a large number of correlated predictors.Comment: Published at http://dx.doi.org/10.1214/07-EJS004 in the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

show abstract

Optimal and fast detection of spatial clusters with scan statistics

Walther¹

2010

Ann. Statist.

163

View full text Add to dashboard Cite

We consider the detection of multivariate spatial clusters in the Bernoulli model with $N$ locations, where the design distribution has weakly dependent marginals. The locations are scanned with a rectangular window with sides parallel to the axes and with varying sizes and aspect ratios. Multivariate scan statistics pose a statistical problem due to the multiple testing over many scan windows, as well as a computational problem because statistics have to be evaluated on many windows. This paper introduces methodology that leads to both statistically optimal inference and computationally efficient algorithms. The main difference to the traditional calibration of scan statistics is the concept of grouping scan windows according to their sizes, and then applying different critical values to different groups. It is shown that this calibration of the scan statistic results in optimal inference for spatial clusters on both small scales and on large scales, as well as in the case where the cluster lives on one of the marginals. Methodology is introduced that allows for an efficient approximation of the set of all rectangles while still guaranteeing the statistical optimality results described above. It is shown that the resulting scan statistic has a computational complexity that is almost linear in $N$.Comment: Published in at http://dx.doi.org/10.1214/09-AOS732 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

show abstract

Multiscale inference about a density

Duembgen¹,

Walther²

2008

Ann. Statist.

154

View full text Add to dashboard Cite

We introduce a multiscale test statistic based on local order statistics and spacings that provides simultaneous confidence statements for the existence and location of local increases and decreases of a density or a failure rate. The procedure provides guaranteed finite-sample significance levels, is easy to implement and possesses certain asymptotic optimality and adaptivity properties.Comment: Version 2 is an extended version (Technical report 56, IMSV, Univ. Bern) which is referred to in version 3. Published in at http://dx.doi.org/10.1214/07-AOS521 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.