A wide variety of tools are available, both parametric and nonparametric, for analyzing spatial data. However, it is not always clear how to translate statistical inferences into decision recommendations. This article explores the possibilities of estimating the effects of decision options using very direct manipulation of data, bypassing formal statistical analysis. We illustrate with the application that motivated this research, a study of arsenic in drinking water in nearly 5000 wells in a small area in rural Bangladesh. We estimate the potential benefits of two possible remedial actions: (1) recommendations that people switch to nearby wells with lower arsenic levels; and (2) drilling new community wells. We use simple nonparametric clustering methods and estimate uncertainties using cross-validation.
A multi-level model allows the possibility of marginalization across levels in different ways, yielding more than one possible marginal likelihood. Since log-likelihoods are often used in classical model comparison, the question to ask is which likelihood should be chosen for a given model. The authors employ a Bayesian framework to shed some light on qualitative comparison of the likelihoods associated with a given model. They connect these results to related issues of the effective number of parameters, penalty function, and consistent definition of a likelihood-based model choice criterion. In particular, with a two-stage model they show that, very generally, regardless of hyperprior specification or how much data is collected or what the realized values are, a priori, the first-stage likelihood is expected to be smaller than the marginal likelihood. A posteriori, these expectations are reversed and the disparities worsen with increasing sample size and with increasing number of model levels
The words that occur in papers published by the journals of an old and prestigious scientific society like the American Statistical Association portray the most relevant research interests of a discipline and the recurrence of words over time show fashions, forgotten topics and new emerging subjects, that is, the history of a discipline at a glance. In this study a set of keywords occurred in the titles of papers published in the period 1888–2012 by the Journal of the American Statistical Association and its predecessors are examined over time in order to retrieve those which appeared in the past and which are today the research fields covered by Statistics, from the viewpoints of both methods and application domains. The existence of a latent temporal pattern in keywords’ occurrences is explored by means of (lexical) correspondence analysis and clusters of keywords portraying similar temporal patterns are identified by functional (textual) data analysis and model-based curve clustering. The analyses reveal a definite time dimension in topics and show that much of the History of Statistics may be gleaned by simply reading the titles of papers through an explorative correspondence analysis. However, the functional approach and model-based curve clustering turn out to be better in tracing and comparing the individual temporal evolution of keywords, despite some computational and theoretical limitations
Small area estimation (SAE) tackles the problem of providing reliable estimates for small areas, i.e., subsets of the population for which sample information is not sufficient to warrant the use of a direct estimator. Hierarchical Bayesian approach to SAE problems offers several advantages over traditional SAE models including the ability of appropriately accounting for the type of surveyed variable. In this paper, a number of model specifications for estimating small area counts are discussed and their relative merits are illustrated. We conducted a simulation study by reproducing in a simplified form the Italian Labour Force Survey and taking the Local Labor Markets as target areas. Simulated data were generated by assuming population characteristics of interest as well as survey sampling design as known. In one set of experiments, numbers of employment/unemployment from census data were utilized, in others population characteristics were varied. Results show persistent model failures for some standard Fay-Herriot specifications and for generalized linear Poisson models with (log-)normal sampling stage, whilst either unmatched or nonnormal sampling stage models get the best performance in terms of bias, accuracy and reliability. Though, the study also found that any model noticeably improves on its performance by letting sampling variances be stochastically determined rather than assumed as known as is the general practice. Moreover, we address the issue of model determination to point out limits and possible deceptions of commonly used criteria for model selection and checking in SAE context
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.