Bob Carpenter scite author profile

Stan is a probabilistic programming language for specifying statistical models. A Stan program imperatively defines a log probability function over parameters conditioned on specified data and constants. As of version 2.14.0, Stan provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm.Stan is also a platform for computing log densities and their gradients and Hessians, which can be used in alternative algorithms such as variational Bayes, expectation propagation, and marginal inference using approximate integration. To this end, Stan is set up so that the densities, gradients, and Hessians, along with intermediate quantities of the algorithm such as acceptance probabilities, are easily accessible.Stan can be called from the command line using the cmdstan package, through R using the rstan package, and through Python using the pystan package. All three interfaces support sampling and optimization-based inference with diagnostics and posterior analysis. rstan and pystan also provide access to log probabilities, gradients, Hessians, parameter transforms, and specialized plotting.

show abstract

Rank-Normalization, Folding, and Localization: An Improved Rˆ for Assessing Convergence of MCMC (with Discussion)

Vehtari

et al. 2021

View full text Add to dashboard Cite

Markov chain Monte Carlo is a key computational tool in Bayesian statistics, but it can be challenging to monitor the convergence of an iterative stochastic algorithm. In this paper we show that the convergence diagnostic R of Gelman and Rubin (1992) has serious flaws. Traditional R will fail to correctly diagnose convergence failures when the chain has a heavy tail or when the variance varies across the chains. In this paper we propose an alternative rank-based diagnostic that fixes these problems. We also introduce a collection of quantile-based local efficiency measures, along with a practical approach for computing Monte Carlo error estimates for quantiles. We suggest that common trace plots should be replaced with rank plots from multiple chains. Finally, we give recommendations for how these methods should be used in practice. * We thank Ben Bales, Ian Langmore, the editor, and anonymous reviewers for useful comments.

show abstract

Overview of BioCreative II gene mention recognition

Smith

Tanabe

Ando³

et al. 2008

Genome Biol

353

314

View full text Add to dashboard Cite

Nineteen teams presented results for the Gene Mention Task at the BioCreative II Workshop. In this task participants designed systems to identify substrings in sentences corresponding to gene name mentions. A variety of different methods were used and the results varied with a highest achieved F1 score of 0.8721. Here we present brief descriptions of all the methods used and a statistical analysis of the results. We also demonstrate that, by combining the results from all submissions, an F score of 0.9066 is feasible, and furthermore that the best result makes use of the lowest scoring submissions.

show abstract

Bayesian analysis of tests with unknown specificity and sensitivity

Gelman

Carpenter

2020

Preprint

101

View full text Add to dashboard Cite

When testing for a rare disease, prevalence estimates can be highly sensitive to uncertainty in the specificity and sensitivity of the test. Bayesian inference is a natural way to propagate these uncertainties, with hierarchical modeling capturing variation in these parameters across experiments. Another concern is the people in the sample not being representative of the general population. Statistical adjustment cannot without strong assumptions correct for selection bias in an opt-in sample, but multilevel regression and poststratification can at least adjust for known differences between the sample and the population. We demonstrate hierarchical regression and poststratification models with code in Stan and discuss their application to a controversial recent study of SARS-CoV-2 antibodies in a sample of people from the Stanford University area. Wide posterior intervals make it impossible to evaluate the quantitative claims of that study regarding the number of unreported infections. For future studies, the methods described here should facilitate more accurate estimates of disease prevalence from imperfect tests performed on non-representative samples.

show abstract

NUT1, a major nitrogen regulatory gene inMagnaporthe grisea, is dispensable for pathogenicity

Froeliger¹,

Carpenter²

1996

Molec. Gen. Genet.

View full text Add to dashboard Cite

NUT1, a gene homologous to the major nitrogen regulatory genes nit-2 of Neurospora crassa and areA of Aspergillus nidulans, was isolated from the rice blast fungus, Magnaporthe grisea. NUT1 encodes a protein of 956 amino acid residues and, like nit-2 and areA, has a single putative zinc finger DNA-binding domain. Functional equivalence of NUT1 to areA was demonstrated by introducing the NUT1 gene by DNA-mediated transformation into an areA loss-of-function mutant of A. nidulans. The introduced NUT1 gene fully complemented the areA null mutation, restoring to the mutant the ability to utilize a variety of nitrogen sources. In addition, the sensitivity of Aspergillus NUT1 transformants to ammonium repression of extracellular protease activity was comparable to that of wild-type A. nidulans. Thus, NUT1 and areA encode functionally equivalent gene products that activate expression of nitrogen-regulated genes. A one-step disruption strategy was used to generate nut1- mutants of M. grisea by transforming a rice-infecting strain with a disruption vector in which a gene for hygromycin B phosphotransferase (Hyg) replaced the zinc-finger DNA-binding motif of NUT1. Of 31 hygromycin B (hyg-B)-resistant transformants shown by Southern hybridization to contain a disrupted NUT1 gene (nut1 : : Hyg), 26 resulted from single-copy replacement events at the NUT1 locus. Although nut1- transformants of M. grisea failed to grown on a variety of nitrogen sources, glutamate, proline and alanine could still be utilized. This contrasts with A. nidulans where disruption of the zinc-finger region of areA prevents utilization of nitrogen sources other than ammonium and glutamine. The role of NUT1 and regulation of nitrogen metabolism in the disease process was evaluated by pathogenicity assays. The infection efficiency of nut1- transformants on susceptible rice plants was similar to that of the parental strain, although lesions were reduced in size. These studies demonstrate that the M. grisea NUT1 gene activates expression of nitrogen-regulated genes but is dispensable for pathogenicity.

show abstract

The Benefits of a Model of Annotation

Passonneau

Carpenter

2014

TACL

105

View full text Add to dashboard Cite

Standard agreement measures for interannotator reliability are neither necessary nor sufficient to ensure a high quality corpus. In a case study of word sense annotation, conventional methods for evaluating labels from trained annotators are contrasted with a probabilistic annotation model applied to crowdsourced data. The annotation model provides far more information, including a certainty measure for each gold standard label; the crowdsourced data was collected at less than half the cost of the conventional approach.

show abstract

Bayesian Analysis of Tests with Unknown Specificity and Sensitivity

Gelman

Carpenter

2020

View full text Add to dashboard Cite

When testing for a rare disease, prevalence estimates can be highly sensitive to uncertainty in the specificity and sensitivity of the test. Bayesian inference is a natural way to propagate these uncertainties, with hierarchical modelling capturing variation in these parameters across experiments. Another concern is the people in the sample not being representative of the general population. Statistical adjustment cannot without strong assumptions correct for selection bias in an opt-in sample, but multilevel regression and post-stratification can at least adjust for known differences between the sample and the population. We demonstrate hierarchical regression and post-stratification models with code in Stan and discuss their application to a controversial recent study of SARS-CoV-2 antibodies in a sample of people from the Stanford University area. Wide posterior intervals make it impossible to evaluate the quantitative claims of that study regarding the number of unreported infections. For future studies, the methods described here should facilitate more accurate estimates of disease prevalence from imperfect tests performed on non-representative samples.

show abstract

Type-Logical Semantics

Carpenter¹

1998

View full text Add to dashboard Cite

This is a very ambitious book, which tries to do several different things: give a detailed and self-contained introduction to type-logical semantics and categorial grammar; illustrate this framework with a comprehensive set of illustrations of semantic analysis of English; and relate all of this to current research within the categorial tradition. It apparently derives from the author's "introductory" course on natural language semantics, but the end result is very far from introductory (I suspect it would scare most beginning linguistics students to death) and the book contains a great deal of original and insightful analysis of interesting semantic phenomena. The first chapter contains a potted introduction to the history of formal and linguistic semantics and related aspects of the philosophy of language and logic; the second consists of an extended introduction to the lambda calculus, combinators, types, etc, and the third is a development of the particular higher-order logic used later. Of these chapters, the first is relatively superficial, but the second and third are very detailed and, more importantly, very clean I would guess Carpenter is an excellent teacher: he presents difficult material in an appealing way, and doesn't go overboard on terse notation or Greek letters. Chapter 4 introduces various flavors of categorial grammar, and Chapter 5 describes the Lambek calculus. By this time we are 180 pages or so into the book, and I was beginning to realize that I didn't (even) know as much as I thought I did about categorial grammar. Semantic analysis of English starts off with coordination and unbounded dependencies (Chapter 6). Carpenter provides his own analysis of these phenomena, incorporating various devices from the categorial literature. His reference to earlier work, here and throughout the book, is always very good, with short summaries of approaches to particular issues by other scholars, showing how they differ, or can be seen as different ways of doing the same thing. Chapter 7 is a long discussion of quantifiers and scope problems. Carpenter starts with Montague's "quantifying in" and Cooper's storage approach to scoping, and then argues that the type-logical treatment he develops captures some of the good features of these approaches, while avoiding most of their descriptive and theoretical problems. He then goes on to a more general discussion of NP meanings, covering definites and indefinites, generics, and possessives, followed by a somewhat tangential but interesting approach to comparatives, and finally discussing existential sentences. Chapter 8 provides an analysis of plural NPs, again giving some very useful summaries of earlier approaches, and gradually developing a semantics that can cope

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Bob Carpenter

Stan: A Probabilistic Programming Language

Rank-Normalization, Folding, and Localization: An Improved Rˆ for Assessing Convergence of MCMC (with Discussion)

Overview of BioCreative II gene mention recognition

Bayesian analysis of tests with unknown specificity and sensitivity

NUT1, a major nitrogen regulatory gene inMagnaporthe grisea, is dispensable for pathogenicity

The Benefits of a Model of Annotation

Bayesian Analysis of Tests with Unknown Specificity and Sensitivity

Type-Logical Semantics

Contact Info

Product

Resources

About