Modern applications of statistical theory and methods can involve extremely large datasets, often with huge numbers of measurements on each of a comparatively small number of experimental units. New methodology and accompanying theory have emerged in response: the goal of this Theme Issue is to illustrate a number of these recent developments. This overview article introduces the difficulties that arise with highdimensional data in the context of the very familiar linear statistical model: we give a taste of what can nevertheless be achieved when the parameter vector of interest is sparse, that is, contains many zero elements. We describe other ways of identifying lowdimensional subspaces of the data space that contain all useful information. The topic of classification is then reviewed along with the problem of identifying, from within a very large set, the variables that help to classify observations. Brief mention is made of the visualization of high-dimensional data and ways to handle computational problems in Bayesian analysis are described. At appropriate points, reference is made to the other papers in the issue.
We define and compute asymptotically optimal difference sequences for estimating error variance in homoscedastic nonparametric regression. Our optimal difference sequences do not depend on unknowns, such as the mean function, and provide substantial improvements over the suboptimal sequences commonly used in practice. For example, in the case of normal data the usual variance estimator based on symmetric second-order differences is only 64% efficient relative to the estimator based on optimal second-order differences. The efficiency of an optimal mth-order difference estimator relative to the error sample variance is 2m/(2m + 1). Again this is for normal data, and increases as the tails of the error distribution become heavier.
Hidden Markov models form an extension of mixture models which provides a¯exible class of models exhibiting dependence and a possibly large degree of variability. We show how reversible jump Markov chain Monte Carlo techniques can be used to estimate the parameters as well as the number of components of a hidden Markov model in a Bayesian framework. We employ a mixture of zero-mean normal distributions as our main example and apply this model to three sets of data from ®nance, meteorology and geomagnetism.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.