Library of Congress Cataloging-in-Publication Data Simonoff, Jeffrey S. Smoothing methods in statistics / Jeffrey S. Simonoff. p. cm. -(Springer series in statistics) Includes bibliographical references and indexes.
Many different methods have been proposed to construct nonparametric estimates of a smooth regression function, including local polynomial, (convolution) kernel and smoothing spline estimators. Each of these estimators uses a smoothing parameter to control the amount of smoothing performed on a given data set. In this paper an improved version of a criterion based on the Akaike information criterion (AIC), termed AIC C , is derived and examined as a way to choose the smoothing parameter. Unlike plug-in methods, AIC C can be used to choose smoothing parameters for any linear smoother, including local quadratic and smoothing spline estimators. The use of AIC C avoids the large variability and tendency to undersmooth (compared with the actual minimizer of average squared error) seen when other`classical' approaches (such as generalized cross-validation or the AIC) are used to choose the smoothing parameter. Monte Carlo simulations demonstrate that the AIC C -based smoothing parameter is competitive with a plug-in method (assuming that one exists) when the plug-in method works well but also performs well when the plug-in approach fails or is unavailable.
Longitudinal data refer to the situation where repeated observations are available for each sampled object. Clustered data, where observations are nested in a hierarchical structure within objects (without time necessarily being involved) represent a similar type of situation. Methodologies that take this structure into account allow for the possibilities of systematic differences between objects that are not related to attributes and autocorrelation within objects across time periods. A standard methodology in the statistics literature for this type of data is the mixed effects model, where these differences between objects are represented by so-called "random effects" that are estimated from the data (population-level relationships are termed "fixed effects," together resulting in a mixed effects model). This paper presents a methodology that combines the structure of mixed effects models for longitudinal and clustered data with the flexibility of tree-based estimation methods. We apply the resulting estimation method, called the RE-EM tree, to pricing in online transactions, showing that the RE-EM tree is less sensitive to parametric assumptions and provides improved predictive power compared to linear models with random effects and regression trees without random effects. We also apply it to a smaller data set examining accident fatalities, and show that the RE-EM tree strongly outperforms a tree without random effects while performing comparably to a linear model with random effects. We also perform extensive simulation experiments to show that the estimator improves predictive performance relative to regression trees without random effects and is comparable or superior to using linear models with random effects in more general situations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations鈥揷itations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.