With the advance of modern technology, more and more data are being recorded continuously during a time interval or intermittently at several discrete time points. These are both examples of functional data, which has become a commonly encountered type of data. Functional data analysis (FDA) encompasses the statistical methodology for such data. Broadly interpreted, FDA deals with the analysis and theory of data that are in the form of functions. This paper provides an overview of FDA, starting with simple statistical notions such as mean and covariance functions, then covering some core techniques, the most popular of which is functional principal component analysis (FPCA). FPCA is an important dimension reduction tool, and in sparse data situations it can be used to impute functional data that are sparsely observed. Other dimension reduction approaches are also discussed. In addition, we review another core technique, functional linear regression, as well as clustering and classification of functional data. Beyond linear and single-or multiple-index methods, we touch upon a few nonlinear approaches that are promising for certain applications. They include additive and other nonlinear functional regression models and models that feature time warping, manifold learning, and empirical differential equations. The paper concludes with a brief discussion of future directions.Annu. Rev. Stat. Appl. 2016.3:257-295. Downloaded from www.annualreviews.org Access provided by Academia Sinica -Life Science Library on 06/04/16. For personal use only. Section 2, and several approaches for dimension reduction in functional regression are discussed in Section 3. Clustering and classification of functional data are useful and important tools with wide-ranging applications in FDA. Methods include extensions of classical k-means and hierarchical clustering, Bayesian and model approaches to clustering, and classification via functional regression and functional discriminant analysis. These topics are explored in Section 4. The classical methods for FDA have been predominantly linear, such as functional principal components (FPCs) or the FLM. As more and more functional data are generated, it has emerged that many such data have inherent nonlinear features that make linear methods less effective. Sections 5 reviews some nonlinear approaches to FDA, including time warping, nonlinear manifold modeling, and nonlinear differential equations to model the underlying empirical dynamics.A well-known and well-studied nonlinear effect is time warping, where in addition to the common amplitude variation, one also considers time variation. This creates a basic nonidentifiability problem, and Section 5.1 provides a discussion of these foundational issues. A more general approach to modeling nonlinearity in functional data is to assume that the functional data lie on www.annualreviews.org • Review of Functional Data Analysis 259
Summary.A functional clustering (FC) method, k-centres FC, for longitudinal data is proposed. The k-centres FC approach accounts for both the means and the modes of variation differentials between clusters by predicting cluster membership with a reclassification step.The cluster membership predictions are based on a non-parametric random-effect model of the truncated Karhunen-Loève expansion, coupled with a non-parametric iterative mean and covariance updating scheme. We show that, under the identifiability conditions derived, the k-centres FC method proposed can greatly improve cluster quality as compared with conventional clustering algorithms. Moreover, by exploring the mean and covariance functions of each cluster, the k-centres FC method provides an additional insight into cluster structures which facilitates functional cluster analysis. Practical performance of the k-centres FC method is demonstrated through simulation studies and data applications including growth curve and gene expression profile data.
Antibody response correlates with severity of infection.
We propose an extended version of the classical Karhunen-Loève expansion of a multivariate random process, termed a normalized multivariate functional principal component (mFPCn) representation. This takes variations between the components of the process into account and takes advantage of component dependencies through the pairwise cross-covariance functions. This approach leads to a single set of multivariate functional principal component scores, which serve well as a proxy for multivariate functional data. We derive the consistency properties for the estimates of the mFPCn, and the asymptotic distributions for statistical inferences. We illustrate the finite sample performance of this approach through the analysis of a traffic flow data set, including an application to clustering and a simulation study. The mFPCn approach serves as a basic and useful statistical tool for multivariate functional data analysis.
There is evidence indicating that ingestion of arsenic may predispose the development of diabetes mellitus in arsenic-endemic areas in Taiwan. However, the prevalence of diabetes and related vascular diseases in the entire southwestern arseniasis-endemic and nonendemic areas remains to be elucidated. We used the National Health Insurance Database for 1999-2000 to derive the prevalence of non-insulin-dependent diabetes and related vascular diseases by age and sex among residents in southwestern arseniasis-endemic and nonendemic areas in Taiwan. The study included 66,667 residents living in endemic areas and 639,667 in nonendemic areas, all ≥ 25 years of age. The status of diabetes and vascular diseases was ascertained through disease diagnosis and treatment prescription included in the reimbursement claims of clinics and hospitals. The prevalence of non-insulin-dependent diabetes, age-and gender-adjusted to the general population in Taiwan, was 7.5% (95% confidence interval, 7.4-7.7%) in the arseniasis-endemic areas and 3.5% (3.5-3.6%) in the nonendemic areas. Among both diabetics and nondiabetics, higher prevalence of microvascular and macrovascular diseases was observed in arseniasis-endemic than in the nonendemic areas. Age-and gender-adjusted prevalence of microvascular disease in endemic and nonendemic areas was 20.0% and 6.0%, respectively, for diabetics, and 8.6% and 1.0%, respectively, for nondiabetics. The corresponding prevalence of macrovascular disease was 25.3% and 13.7% for diabetics, and 12.3% and 5.5% for nondiabetics. Arsenic has been suggested to increase the risk of non-insulin-dependent diabetes mellitus and its related micro-and macrovascular diseases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.