We define a family of probability distributions for random count matrices with a potentially unbounded number of rows and columns. The three distributions we consider are derived from the gamma-Poisson, gamma-negative binomial, and beta-negative binomial processes, which we refer to generically as a family of negative-binomial processes. Because the models lead to closed-form update equations within the context of a Gibbs sampler, they are natural candidates for nonparametric Bayesian priors over count matrices. A key aspect of our analysis is the recognition that, although the random count matrices within the family are defined by a rowwise construction, their columns can be shown to be independent and identically distributed. This fact is used to derive explicit formulas for drawing all the columns at once. Moreover, by analyzing these matrices' combinatorial structure, we describe how to sequentially construct a column-i.i.d. random count matrix one row at a time, and derive the predictive distribution of a new row count vector with previously unseen features. We describe the similarities and differences between the three priors, and argue that the greater flexibility of the gamma-and beta-negative binomial processes-especially their ability to model over-dispersed, heavytailed count data-makes these well suited to a wide variety of real-world applications. As an example of our framework, we construct a naive-Bayes text classifier to categorize a count vector to one of several existing random count matrices of different categories. The classifier supports an unbounded number of features, and unlike most existing methods, it does not require a predefined finite vocabulary to be shared by all the categories, and needs neither feature selection nor parameter tuning. Both the gamma-and beta-negative binomial processes are shown to significantly outperform the gamma-Poisson process when applied to document categorization, with comparable performance to other state-of-the-art supervised text classification algorithms.The authors are with the
Summary The fused lasso, also known as total-variation denoising, is a locally adaptive function estimator over a regular grid of design points. In this article, we extend the fused lasso to settings in which the points do not occur on a regular grid, leading to a method for nonparametric regression. This approach, which we call the $K$-nearest-neighbours fused lasso, involves computing the $K$-nearest-neighbours graph of the design points and then performing the fused lasso over this graph. We show that this procedure has a number of theoretical advantages over competing methods: specifically, it inherits local adaptivity from its connection to the fused lasso, and it inherits manifold adaptivity from its connection to the $K$-nearest-neighbours approach. In a simulation study and an application to flu data, we show that excellent results are obtained. For completeness, we also study an estimator that makes use of an $\epsilon$-graph rather than a $K$-nearest-neighbours graph and contrast it with the $K$-nearest-neighbours fused lasso.
Summary Many recently developed Bayesian methods have focused on sparse signal detection. However, much less work has been done addressing the natural follow-up question: how to make valid inferences for the magnitude of those signals after selection. Ordinary Bayesian credible intervals suffer from selection bias, as do ordinary frequentist confidence intervals. Existing Bayesian approaches for correcting this bias produce credible intervals with poor frequentist properties. Further, existing frequentist approaches require sacrificing the benefits of shrinkage typical in Bayesian methods, resulting in confidence intervals that are needlessly wide. We address this gap by proposing a nonparametric empirical Bayes approach for constructing optimal selectionadjusted confidence sets. Our method produces confidence sets that are as short as possible on average, while both adjusting for selection and maintaining exact frequentist coverage uniformly over the parameter space. We demonstrate an important consistency property of our procedure: under mild conditions, it asymptotically converges to the results of an oracle-Bayes analysis in which the prior distribution of signal sizes is known exactly. Across a series of examples, the method outperforms existing frequentist techniques for post-selection inference, producing confidence sets that are notably shorter but with the same coverage guarantee. All proofs may be found in the Supplementary Material.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.