We present the R package mixsmsn, which implements routines for maximum likelihood estimation (via an expectation maximization EM-type algorithm) in finite mixture models with components belonging to the class of scale mixtures of the skew-normal distribution, which we call the FMSMSN models. Both univariate and multivariate responses are considered. It is possible to fix the number of components of the mixture to be fitted, but there exists an option that transfers this responsibility to an automated procedure, through the analysis of several models choice criteria. Plotting routines to generate histograms, plug-in densities and contour plots using the fitted models output are also available. The precision of the EM estimates can be evaluated through their estimated standard deviations, which can be obtained by the provision of an approximation of the associated information matrix for each particular model in the FMSMSN family. A function to generate artificial samples from several elements of the family is also supplied. Finally, two real data sets are analyzed in order to show the usefulness of the package.
Mixed-effects models are commonly used to fit longitudinal or repeated measures data. A complication arises when the response is censored, for example, due to limits of quantification of the assay used. Although normal distributions are commonly assumed for random effects and residual errors, such assumptions make inferences vulnerable to outliers. The sensitivity to outliers and the need for heavy tailed distributions for random effects and residual errors motivate us to develop a likelihood-based inference for linear and nonlinear mixed effects models with censored response (NLMEC/LMEC) based on the multivariate Student-t distribution. An ECM algorithm is developed for computing the maximum likelihood estimates for NLMEC/LMEC with the standard errors of the fixed effects and the exact likelihood value as a by-product. The algorithm uses closed-form expressions at the E-step, that rely on formulas for the mean and variance of a truncated multivariatet distribution. The proposed algorithm is implemented in the R package tlmec. It is applied to analyze longitudinal HIV viral load data in two recent AIDS studies. In addition, a simulation study is conducted to examine the performance of the proposed method and to compare it with the approach of Vaida and Liu (2009).
The purely spatial and space-time scan statistics have been successfully used by many scientists to detect and evaluate geographical disease clusters. Although the scan statistic has high power in correctly identifying a cluster, no study has considered the estimates of the cluster relative risk in the detected cluster. In this paper we evaluate whether there is any bias on these estimated relative risks. Intuitively, one may expect that the estimated relative risks has upward bias, since the scan statistic cherry picks high rate areas to include in the cluster. We show that this intuition is correct for clusters with low statistical power, but with medium to high power the bias becomes negligible. The same behaviour is not observed for the prospective space-time scan statistic, where there is an increasing conservative downward bias of the relative risk as the power to detect the cluster increases.
Please cite this article as: Prates, M.O., Dey, D.K., Willig, M.R., Yan, J., Transformed Gaussian Markov random fields and spatial modeling of species abundance. Spatial Statistics (2015), http://dx.
AbstractGaussian random field and Gaussian Markov random field have been widely used to accommodate spatial dependence under the generalized linear mixed models framework. To model spatial count and spatial binary data, we present a class of transformed Gaussian Markov random fields, constructed by transforming the margins of a Gaussian Markov random field to desired marginal distributions that accommodate asymmetry and heavy tail, as needed in many empirical circumstances. The Gaussian copula that characterizes the dependence structure facilitates inferences and applications in modeling spatial dependence. This construction leads to new models such as gamma or beta Markov fields with Gaussian copulas, that are used to model Poisson intensities or Bernoulli rates in hierarchical spatial analyses. The method is naturally implemented in a Bayesian framework. To illustrate our methodology, abundances of variety of gastropod species were collected as counts or presence versus absence from a network of spatial locations in the Luquillo Mountains of Puerto Rico. Gastropods are of considerable ecological importance in terrestrial ecosystems because of their species richness, abundances, and critical roles in ecosystem processes such as decomposition and nutrient cycling. The new models outperform the traditional models based on Bayesian model comparison with conditional predictive ordinate. The validity of Bayesian inferences and model selection were assessed through simulation studies for both spatial Poisson regression and spatial Bernoulli regression.
Summary. The Northern Humboldt Current System (NHCS) is the world's most productive ecosystem in terms of fish. In particular, the Peruvian anchovy (Engraulis ringens) is the major prey of the main top predators, like seabirds, fish, humans, and other mammals. In this context, it is important to understand the dynamics of the anchovy distribution to preserve it as well as to exploit its economic capacities. Using the data collected by the "Instituto del Mar del Perú" (IMARPE) during a scientific survey in 2005, we present a statistical analysis that has as main goals: (i) to adapt to the characteristics of the sampled data, such as spatial dependence, high proportions of zeros and big size of samples; (ii) to provide important insights on the dynamics of the anchovy population; and (iii) to propose a model for estimation and prediction of anchovy biomass in the NHCS offshore from Perú. These data were analyzed in a Bayesian framework using the integrated nested Laplace approximation (INLA) method. Further, to select the best model and to study the predictive power of each model, we performed model comparisons and predictive checks, respectively. Finally, we carried out a Bayesian spatial influence diagnostic for the preferred model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.