The construction of decision-theoretic Bayesian designs for realistically-complex nonlinear models is computationally challenging, as it requires the optimization of analytically intractable expected utility functions over high-dimensional design spaces. We provide the most general solution to date for this problem through a novel approximate coordinate exchange algorithm. This methodology uses a Gaussian process emulator to approximate the expected utility as a function of a single design coordinate in a series of conditional optimization steps. It has flexibility to address problems for any choice of utility function and for a wide range of statistical models with different numbers of variables, numbers of runs and randomization restrictions. In contrast to existing approaches to Bayesian design, the method can find multi-variable designs in large numbers of runs without resorting to asymptotic approximations to the posterior distribution or expected utility. The methodology is demonstrated on a variety of challenging examples of practical importance, including design for pharmacokinetic models and design for mixed models with discrete data. For many of these models, Bayesian designs are not currently available. Comparisons are made to results from the literature, and to designs obtained from asymptotic approximations.
A default strategy for fully Bayesian model determination for GLMMs is considered which addresses the two key issues of default prior specification and computation. In particular, the concept of unit information priors is extended to the parameters of a GLMM. A combination of MCMC and Laplace approximations is used to compute approximations to the posterior model probabilities to find a subset of models with high posterior model probability. Bridge sampling is then used on the models in this subset to approximate the posterior model probabilities more accurately. The strategy is applied to four examples.
The generation of decision-theoretic Bayesian optimal designs is complicated by the significant computational challenge of minimising an analytically intractable expected loss function over a, potentially, high-dimensional design space. A new general approach for approximately finding Bayesian optimal designs is proposed which uses computationally efficient normal-based approximations to posterior summaries to aid in approximating the expected loss. This new approach is demonstrated on illustrative, yet challenging, examples including hierarchical models for blocked experiments, and experimental aims of parameter estimation and model discrimination. Where possible, the results of the proposed methodology are compared, both in terms of performance and computing time, to results from using computationally more expensive, but potentially more accurate, Monte Carlo approximations. Moreover, the methodology is also applied to problems where the use of Monte Carlo approximations is computationally infeasible.
Summary.Injecting drug users (IDUs) have a direct social and economic effect yet can typically be regarded as a hidden population within a community. We estimate the size of the IDU population across the nine different Government Office regions of England in 2005-2006 by using capture-recapture methods with age (ranging from 15 to 64 years) and gender as covariate information. We consider a Bayesian model averaging approach using log-linear models, where we can include explicit prior information within the analysis in relation to the total IDU population (elicited from the number of drug-related deaths and injectors' drug-related death rates). Estimation at the regional level allows for regional heterogeneity with these regional estimates aggregated to obtain a posterior mean estimate for the number of England's IDUs of 195840 with 95% credible interval (181700, 210480). There is significant variation in the estimated regional prevalence of current IDUs per million of population aged 15-64 years, and in injecting drug-related death rates across the gender age cross-classifications. The propensity of an IDU to be seen by at least one source also exhibits strong regional variability with London having the lowest propensity of being observed (posterior mean probability 0.21) and the South West the highest propensity (posterior mean 0.46).
Using Bayesian capture–recapture analysis, we estimated the number of current injecting drug users (IDUs) in Scotland in 2006 from the cross-counts of 5670 IDUs listed on four data-sources: social enquiry reports (901 IDUs listed), hospital records (953), drug treatment agencies (3504), and recent Hepatitis C virus (HCV) diagnoses (827 listed as IDU-risk). Further, we accessed exact numbers of opiate-related drugs-related deaths (DRDs) in 2006 and 2007 to improve estimation of Scotland's DRD rates per 100 current IDUs. Using all four data-sources, and model-averaging of standard hierarchical log-linear models to allow for pairwise interactions between data-sources and/or demographic classifications, Scotland had an estimated 31700 IDUs in 2006 (95% credible interval: 24900–38700); but 25000 IDUs (95% CI: 20700–35000) by excluding recent HCV diagnoses whose IDU-risk can refer to past injecting. Only in the younger age-group (15–34 years) were Scotland's opiate-related DRD rates significantly lower for females than males. Older males’ opiate-related DRD rate was 1.9 (1.24–2.40) per 100 current IDUs without or 1.3 (0.94–1.64) with inclusion of recent HCV diagnoses. If, indeed, Scotland had only 25000 current IDUs in 2006, with only 8200 of them aged 35+ years, the opiate-related DRD rate is higher among this older age group than has been appreciated hitherto. There is counter-balancing good news for the public health: the hitherto sharp increase in older current IDUs had stalled by 2006.
The design of an experiment can always be considered at least implicitly Bayesian, with prior knowledge used informally to aid decisions such as the variables to be studied and the choice of a plausible relationship between the explanatory variables and measured responses. Bayesian methods allow uncertainty in these decisions to be incorporated into design selection through prior distributions that encapsulate information available from scientific knowledge or previous experimentation. Further, a design may be explicitly tailored to the aim of the experiment through a decision-theoretic approach using an appropriate loss function. We review the area of decision-theoretic Bayesian design, with particular emphasis on recent advances in computational methods. For many problems arising in industry and science, experiments result in a discrete response that is well described by a member of the class of generalized linear models. Bayesian design for such nonlinear models is often seen as impractical as the expected loss is analytically intractable and numerical approximations are usually computationally expensive. We describe how Gaussian process emulation, commonly used in computer experiments, can play an important role in facilitating Bayesian design for realistic problems. A main focus is the combination of Gaussian process regression to approximate the expected loss with cyclic descent (coordinate exchange) optimization algorithms to allow optimal designs to be found for previously infeasible problems. We also present the first optimal design results for statistical models formed from dimensional analysis, a methodology widely employed in the engineering and physical sciences to produce parsimonious and interpretable models. Using the famous paper helicopter experiment, we show the potential for the combination of Bayesian design, generalized linear models, and dimensional analysis to produce small but informative experiments.
Estimating the size of hidden or difficult to reach populations is often of interest for economic, sociological or public health reasons. In order to estimate such populations, administrative data lists are often collated to form multi-list cross-counts and displayed in the form of an incomplete contingency table. Log-linear models are typically fitted to such data to obtain an estimate of the total population size by estimating the number of individuals not observed by any of the data-sources. This approach has been taken to estimate the current number of people who inject drugs (PWID) in Scotland, with the Hepatitis C virus diagnosis database used as one of the data-sources to identify PWID. However, the Hepatitis C virus diagnosis data-source does not distinguish between current and former PWID, which, if ignored, will lead to overestimation of the total population size of current PWID. We extend the standard model-fitting approach to allow for a data-source, which contains a mixture of target and non-target individuals (i.e. in this case, current and former PWID). We apply the proposed approach to data for PWID in Scotland in 2003, 2006 and 2009 and compare with the results from standard log-linear models. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd.
A Bayesian design is given by maximising an expected utility over a design space. The utility is chosen to represent the aim of the experiment and its expectation is taken with respect to all unknowns: responses, parameters and/or models. Although straightforward in principle, there are several challenges to finding Bayesian designs in practice. Firstly, the utility and expected utility are rarely available in closed form and require approximation. Secondly, the design space can be of high-dimensionality. In the case of intractable likelihood models, these problems are compounded by the fact that the likelihood function, whose evaluation is required to approximate the expected utility, is not available in closed form. A strategy is proposed to find Bayesian designs for intractable likelihood models. It relies on the development of an automatic, auxiliary modelling approach, using multivariate Gaussian process emulators, to approximate the likelihood function. This is then combined with a copula-based approach to approximate the marginal likelihood (a quantity commonly required to evaluate many utility functions). These approximations are demonstrated on examples of stochastic process models involving experimental aims of both parameter estimation and model comparison.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.