We propose a more efficient version of the slice sampler for Dirichlet process mixture models described by Walker (2007). This sampler allows the fitting of infinite mixture models with a wide-range of prior specification. To illustrate this flexiblity we develop a new nonparametric prior for mixture models by normalizing an infinite sequence of independent positive random variables and show how the slice sampler can be applied to make inference in this model. Two submodels are studied in detail. The first one assumes that the positive random variables are Gamma distributed and the second assumes that they are inverseGaussian distributed. Both priors have two hyperparameters and we consider their effect on the prior distribution of the number of occupied clusters in a sample. Extensive computational comparisons with alternative "conditional" simulation techniques for mixture models using the standard Dirichlet process prior and our new prior are made. The properties of the new prior are illustrated on a density estimation problem.
In this paper we propose a new framework for Bayesian nonparametric modelling with continuous covariates. In particular, we allow the nonparametric distribution to depend on covariates through ordering the random variables building the weights in the stick-breaking representation. We focus mostly on the class of random distributions which induces a Dirichlet process at each covariate value. We derive the correlation between distributions at different covariate values, and use a point process to implement a practically useful type of ordering. Two main constructions with analytically known correlation structures are proposed. Practical and efficient computational methods are introduced. We apply our framework, though mixtures of these processes, to regression modelling, the modelling of stochastic volatility in time series data and spatial geostatistical modelling.
Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct—but often complementary—information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets.Results: Using a set of six artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real Saccharomyces cerevisiae datasets. In the two-dataset case, we show that MDI’s performance is comparable with the present state-of-the-art. We then move beyond the capabilities of current approaches and integrate gene expression, chromatin immunoprecipitation–chip and protein–protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques—as well as to non-integrative approaches—demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods.Availability: A Matlab implementation of MDI is available from http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/.Contact: D.L.Wild@warwick.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online.
The Lasso has sparked interest in the use of penalization of the log-likelihood for variable selection, as well as for shrinkage. We are particularly interested in the more-variables-thanobservations case of characteristic importance for modern data. The Bayesian interpretation of the Lasso as the maximum a posteriori estimate of the regression coefficients, which have been given independent, double exponential prior distributions, is adopted. Generalizing this prior provides a family of hyper-Lasso penalty functions, which includes the quasi-Cauchy distribution of Johnstone and Silverman as a special case. The properties of this approach, including the oracle property, are explored, and an EM algorithm for inference in regression problems is described. The posterior is multi-modal, and we suggest a strategy of using a set of perfectly fitting random starting values to explore modes in different regions of the parameter space. Simulations show that our procedure provides significant improvements on a range of established procedures, and we provide an example from chemometrics.
Markov chain Monte Carlo (MCMC) methods have become a ubiquitous tool in Bayesian analysis. This paper implements MCMC methods for Bayesian analysis of stochastic frontier models using the WinBUGS package, a freely available software. General code for cross-sectional and panel data are presented and various ways of summarizing posterior inference are discussed. Several examples illustrate that analyses with models of genuine practical interest can be performed straightforwardly and model changes are easily implemented. Although WinBUGS may not be that efficient for more complicated models, it does make Bayesian inference with stochastic frontier models easily accessible for applied researchers and its generic structure allows for a lot of flexibility in model specification.
Summary Environmental DNA is a survey tool with rapidly expanding applications for assessing the presence of a species at surveyed sites. Environmental DNA methodology is known to be prone to false negative and false positive errors at the data collection and laboratory analysis stages. Existing models for environmental DNA data require augmentation with additional sources of information to overcome identifiability issues of the likelihood function and do not account for environmental covariates that predict the probability of species presence or the probabilities of error. We present a novel Bayesian model for analysing environmental DNA data by proposing informative prior distributions for logistic regression coefficients that enable us to overcome parameter identifiability, while performing efficient Bayesian variable selection. Our methodology does not require the use of transdimensional algorithms and provides a general framework for performing Bayesian variable selection under informative prior distributions in logistic regression models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.