The design of an experiment can always be considered at least implicitly Bayesian, with prior knowledge used informally to aid decisions such as the variables to be studied and the choice of a plausible relationship between the explanatory variables and measured responses. Bayesian methods allow uncertainty in these decisions to be incorporated into design selection through prior distributions that encapsulate information available from scientific knowledge or previous experimentation. Further, a design may be explicitly tailored to the aim of the experiment through a decision-theoretic approach using an appropriate loss function. We review the area of decision-theoretic Bayesian design, with particular emphasis on recent advances in computational methods. For many problems arising in industry and science, experiments result in a discrete response that is well described by a member of the class of generalized linear models. Bayesian design for such nonlinear models is often seen as impractical as the expected loss is analytically intractable and numerical approximations are usually computationally expensive. We describe how Gaussian process emulation, commonly used in computer experiments, can play an important role in facilitating Bayesian design for realistic problems. A main focus is the combination of Gaussian process regression to approximate the expected loss with cyclic descent (coordinate exchange) optimization algorithms to allow optimal designs to be found for previously infeasible problems. We also present the first optimal design results for statistical models formed from dimensional analysis, a methodology widely employed in the engineering and physical sciences to produce parsimonious and interpretable models. Using the famous paper helicopter experiment, we show the potential for the combination of Bayesian design, generalized linear models, and dimensional analysis to produce small but informative experiments.
The selection of optimal designs for generalized linear mixed models is complicated by the fact that the Fisher information matrix, on which most optimality criteria depend, is computationally expensive to evaluate. Our focus is on the design of experiments for likelihood estimation of parameters in the conditional model. We provide two novel approximations that substantially reduce the computational cost of evaluating the information matrix by complete enumeration of response outcomes, or Monte Carlo approximations thereof: (i) an asymptotic approximation which is accurate when there is strong dependence between observations in the same block; (ii) an approximation via Kriging interpolators. For logistic random intercept models, we show how interpolation can be especially effective for finding pseudo-Bayesian designs that incorporate uncertainty in the values of the model parameters. The new results are used to provide the first evaluation of the efficiency, for estimating conditional models, of optimal designs from closed-form approximations to the information matrix derived from marginal models. It is found that correcting for the marginal attenuation of parameters in binary-response models yields much improved designs, typically with very high efficiencies. However, in some experiments exhibiting strong dependence, designs for marginal models may still be inefficient for conditional modelling. Our asymptotic results provide some theoretical insights into why such inefficiencies occur.
In game theory and statistical decision theory, a random (i.e., mixed) decision strategy often outperforms a deterministic strategy in minimax expected loss. As experimental design can be viewed as a game pitting the Statistician against Nature, the use of a random strategy to choose a design will often be beneficial. However, the topic of minimax-efficient random strategies for design selection is mostly unexplored, with consideration limited to Fisherian randomization of the allocation of a predetermined set of treatments to experimental units. Here, for the first time, novel and more flexible random design strategies are shown to have better properties than their deterministic counterparts in linear model estimation and prediction, including stronger bounds on both the expectation and survivor function of the loss distribution. Design strategies are considered for three important statistical problems: (i) parameter estimation in linear potential outcomes models, (ii) point prediction from a correct linear model, and (iii) global prediction from a linear model taking into account an L 2 -class of possible model discrepancy functions. The new random design strategies proposed for (iii) give a finite bound on the expected loss, a dramatic improvement compared to existing deterministic exact designs for which the expected loss is unbounded. Supplementary materials for this article are available online.
For Bayesian D-optimal design, we define a singular prior distribution for the model parameters as a prior distribution such that the determinant of the Fisher information matrix has a prior geometric mean of zero for all designs. For such a prior distribution, the Bayesian D-optimality criterion fails to select a design. For the exponential decay model, we characterize singularity of the prior distribution in terms of the expectations of a few elementary transformations of the parameter. For a compartmental model and several multi-parameter generalized linear models, we establish sufficient conditions for singularity of a prior distribution. For the generalized linear models we also obtain sufficient conditions for non-singularity.In the existing literature, weakly informative prior distributions are commonly recommended as a default choice for inference in logistic regression. Here it is shown that some of the recommended prior distributions are singular, and hence should not be used for Bayesian D-optimal design. Additionally, methods are developed to derive and assess Bayesian D-efficient designs when numerical evaluation of the objective function fails due to ill-conditioning, as often occurs for heavy-tailed prior distributions. These numerical methods are illustrated for logistic regression.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.