While the cost of sequencing genomes has decreased dramatically in recent years, this expense often remains non-trivial. Under a fixed budget, scientists face a natural trade-off between quantity and quality: spending resources to sequence a greater number of genomes or spending resources to sequence genomes with increased accuracy. Our goal is to find the optimal allocation of resources between quantity and quality. Optimizing resource allocation promises to reveal as many new variations in the genome as possible. In this paper, we introduce a Bayesian nonparametric methodology to predict the number of new variants in a follow-up study based on a pilot study. When experimental conditions are kept constant between the pilot and follow-up, we find that our prediction is competitive with the best existing methods. Unlike current methods, though, our new method allows practitioners to change experimental conditions between the pilot and the follow-up. We demonstrate how this distinction allows our method to be used for more realistic predictions and for optimal allocation of a fixed budget between quality and quantity.
In this study we introduce a new class of experimental designs. In a classical randomized controlled trial (RCT), or A/B test, a randomly selected subset of a population of units (e.g., individuals, plots of land, or experiences) is assigned to a treatment (treatment A), and the remainder of the population is assigned to the control treatment (treatment B). The difference in average outcome by treatment group is an estimate of the average effect of the treatment. However, motivating our study, the setting for modern experiments is often different, with the outcomes and treatment assignments indexed by multiple populations. For example, outcomes may be indexed by buyers and sellers, by content creators and subscribers, by drivers and riders, or by travelers and airlines and travel agents, with treatments potentially varying across these indices. Spillovers or interference can arise from interactions between units across populations. For example, sellers' behavior may depend on buyers' treatment assignment, or vice versa. This can invalidate the simple comparison of means as an estimator for the average effect of the treatment in classical RCTs. We propose new experiment designs for settings in which multiple populations interact. We show how these designs allow us to study questions about interference that cannot be answered by classical randomized experiments. Finally, we develop new statistical methods for analyzing these Multiple Randomization Designs. * We are grateful for discussions with Susan Athey, John Geweke, Matt Taddy, and Johan Ugander, and for comments by participants in the CODE@MIT, NABE (2019), and the ASSA conferences (2020). This research was carried out as part of Imbens' consulting relationship with Amazon. Brian Burdick contributed while working for Amazon.
There is a growing interest in the estimation of the number of unseen features, mostly driven by applications in biological sciences. A recent work brought out the upside and the downside of the popular stable-Beta process prior, and generalizations thereof, in Bayesian nonparametric inference for the unseen-features problem: i) the downside lies in the limited use of the sampling information in the posterior distributions, which depend on the observable sample only through the sample size; ii) the upside lies in the analytical tractability and interpretability of the posterior distributions, which are simple Poisson distributions whose parameters are simple to compute, and depend on the sample size and the prior's parameter. In this paper, we introduce and investigate an alternative nonparametric prior, referred to as the stable-Beta scaled process prior, which is the first prior that allows to enrich the posterior distribution of the number of unseen features, through the inclusion of the sampling information on the number of distinct features in the observable sample, while maintaining the same analytical tractability and interpretability as the stable-Beta process prior. Our prior leads to a negative Binomial posterior distribution, whose parameters depends on the sample size, the observed number of distinct features and the prior's parameter, providing estimates that are simple, linear in the sampling information and computationally efficient. We apply our approach to synthetic and real genetic data, showing that it outperforms parametric and nonparametric competitors in terms of estimation accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.