We propose a new method for the construction and visualization of
boxplot-type displays for functional data. We use a recent functional data
analysis framework, based on a representation of functions called square-root
slope functions, to decompose observed variation in functional data into three
main components: amplitude, phase, and vertical translation. We then construct
separate displays for each component, using the geometry and metric of each
representation space, based on a novel definition of the median, the two
quartiles, and extreme observations. The outlyingness of functional data is a
very complex concept. Thus, we propose to identify outliers based on any of the
three main components after decomposition. We provide a variety of
visualization tools for the proposed boxplot-type displays including surface
plots. We evaluate the proposed method using extensive simulations and then
focus our attention on three real data applications including exploratory data
analysis of sea surface temperature functions, electrocardiogram functions and
growth curves.Comment: Journal of the American Statistical Association, 201
We propose a geometric framework to assess sensitivity of Bayesian procedures to modeling assumptions based on the nonparametric Fisher-Rao metric. While the framework is general in spirit, the focus of this article is restricted to metric-based diagnosis under two settings: assessing local and global robustness in Bayesian procedures to perturbations of the likelihood and prior, and identification of influential observations. The approach is based on the square-root representation of densities which enables one to compute geodesics and geodesic distances in analytical form, facilitating the definition of naturally calibrated local and global discrepancy measures. An important feature of our approach is the definition of a geometric -contamination class of sampling distributions and priors via intrinsic analysis on the space of probability density functions. We showcase the applicability of our framework on several simulated toy datasets as well as in real data settings for generalized mixed effects models, directional data and shape data.
We develop a clustering framework for observations from a population with a smooth probability distribution function and derive its asymptotic properties. A clustering criterion based on a linear combination of order statistics is proposed. The asymptotic behavior of the point at which the observations are split into two clusters is examined. The results obtained can then be utilized to construct an interval estimate of the point which splits the data and develop tests for bimodality and presence of clusters.
We develop a general statistical framework for the analysis and inference of large tree-structured data, with a focus on developing asymptotic goodness-of-fit tests. We first propose a consistent statistical model for binary trees, from which we develop a class of invariant tests. Using the model for binary trees, we then construct tests for general trees by using the distributional properties of the Continuum Random Tree, which arises as the invariant limit for a broad class of models for tree-structured data based on conditioned Galton-Watson processes. The test statistics for the goodness-of-fit tests are simple to compute and are asymptotically distributed as χ 2 and F random variables. We illustrate our methods on an important application of detecting tumour heterogeneity in brain cancer. We use a novel approach with tree-based representations of magnetic resonance images and employ the developed tests to ascertain tumor heterogeneity between two groups of patients.
We detail an approach to develop Stein's method for bounding integral metrics on probability measures defined on a Riemannian manifold M . Our approach exploits the relationship between the generator of a diffusion on M with target invariant measure and its characterising Stein operator. We consider a pair of such diffusions with different starting points, and investigate properties of solution to the Stein equation based on analysis of the distance process between the pair. Several examples elucidating the role of geometry of M in these developments are presented.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.