Stan is a probabilistic programming language for specifying statistical models. A Stan program imperatively defines a log probability function over parameters conditioned on specified data and constants. As of version 2.14.0, Stan provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm.Stan is also a platform for computing log densities and their gradients and Hessians, which can be used in alternative algorithms such as variational Bayes, expectation propagation, and marginal inference using approximate integration. To this end, Stan is set up so that the densities, gradients, and Hessians, along with intermediate quantities of the algorithm such as acceptance probabilities, are easily accessible.Stan can be called from the command line using the cmdstan package, through R using the rstan package, and through Python using the pystan package. All three interfaces support sampling and optimization-based inference with diagnostics and posterior analysis. rstan and pystan also provide access to log probabilities, gradients, Hessians, parameter transforms, and specialized plotting.
Markov chain Monte Carlo is a key computational tool in Bayesian statistics, but it can be challenging to monitor the convergence of an iterative stochastic algorithm. In this paper we show that the convergence diagnostic R of Gelman and Rubin (1992) has serious flaws. Traditional R will fail to correctly diagnose convergence failures when the chain has a heavy tail or when the variance varies across the chains. In this paper we propose an alternative rank-based diagnostic that fixes these problems. We also introduce a collection of quantile-based local efficiency measures, along with a practical approach for computing Monte Carlo error estimates for quantiles. We suggest that common trace plots should be replaced with rank plots from multiple chains. Finally, we give recommendations for how these methods should be used in practice. * We thank Ben Bales, Ian Langmore, the editor, and anonymous reviewers for useful comments.
Nineteen teams presented results for the Gene Mention Task at the BioCreative II Workshop. In this task participants designed systems to identify substrings in sentences corresponding to gene name mentions. A variety of different methods were used and the results varied with a highest achieved F1 score of 0.8721. Here we present brief descriptions of all the methods used and a statistical analysis of the results. We also demonstrate that, by combining the results from all submissions, an F score of 0.9066 is feasible, and furthermore that the best result makes use of the lowest scoring submissions.
NUT1, a gene homologous to the major nitrogen regulatory genes nit-2 of Neurospora crassa and areA of Aspergillus nidulans, was isolated from the rice blast fungus, Magnaporthe grisea. NUT1 encodes a protein of 956 amino acid residues and, like nit-2 and areA, has a single putative zinc finger DNA-binding domain. Functional equivalence of NUT1 to areA was demonstrated by introducing the NUT1 gene by DNA-mediated transformation into an areA loss-of-function mutant of A. nidulans. The introduced NUT1 gene fully complemented the areA null mutation, restoring to the mutant the ability to utilize a variety of nitrogen sources. In addition, the sensitivity of Aspergillus NUT1 transformants to ammonium repression of extracellular protease activity was comparable to that of wild-type A. nidulans. Thus, NUT1 and areA encode functionally equivalent gene products that activate expression of nitrogen-regulated genes. A one-step disruption strategy was used to generate nut1- mutants of M. grisea by transforming a rice-infecting strain with a disruption vector in which a gene for hygromycin B phosphotransferase (Hyg) replaced the zinc-finger DNA-binding motif of NUT1. Of 31 hygromycin B (hyg-B)-resistant transformants shown by Southern hybridization to contain a disrupted NUT1 gene (nut1 : : Hyg), 26 resulted from single-copy replacement events at the NUT1 locus. Although nut1- transformants of M. grisea failed to grown on a variety of nitrogen sources, glutamate, proline and alanine could still be utilized. This contrasts with A. nidulans where disruption of the zinc-finger region of areA prevents utilization of nitrogen sources other than ammonium and glutamine. The role of NUT1 and regulation of nitrogen metabolism in the disease process was evaluated by pathogenicity assays. The infection efficiency of nut1- transformants on susceptible rice plants was similar to that of the parental strain, although lesions were reduced in size. These studies demonstrate that the M. grisea NUT1 gene activates expression of nitrogen-regulated genes but is dispensable for pathogenicity.
Standard agreement measures for interannotator reliability are neither necessary nor sufficient to ensure a high quality corpus. In a case study of word sense annotation, conventional methods for evaluating labels from trained annotators are contrasted with a probabilistic annotation model applied to crowdsourced data. The annotation model provides far more information, including a certainty measure for each gold standard label; the crowdsourced data was collected at less than half the cost of the conventional approach.
When testing for a rare disease, prevalence estimates can be highly sensitive to uncertainty in the specificity and sensitivity of the test. Bayesian inference is a natural way to propagate these uncertainties, with hierarchical modelling capturing variation in these parameters across experiments. Another concern is the people in the sample not being representative of the general population. Statistical adjustment cannot without strong assumptions correct for selection bias in an opt-in sample, but multilevel regression and post-stratification can at least adjust for known differences between the sample and the population. We demonstrate hierarchical regression and post-stratification models with code in Stan and discuss their application to a controversial recent study of SARS-CoV-2 antibodies in a sample of people from the Stanford University area. Wide posterior intervals make it impossible to evaluate the quantitative claims of that study regarding the number of unreported infections. For future studies, the methods described here should facilitate more accurate estimates of disease prevalence from imperfect tests performed on non-representative samples.
This is a very ambitious book, which tries to do several different things: give a detailed and self-contained introduction to type-logical semantics and categorial grammar; illustrate this framework with a comprehensive set of illustrations of semantic analysis of English; and relate all of this to current research within the categorial tradition. It apparently derives from the author's "introductory" course on natural language semantics, but the end result is very far from introductory (I suspect it would scare most beginning linguistics students to death) and the book contains a great deal of original and insightful analysis of interesting semantic phenomena. The first chapter contains a potted introduction to the history of formal and linguistic semantics and related aspects of the philosophy of language and logic; the second consists of an extended introduction to the lambda calculus, combinators, types, etc, and the third is a development of the particular higher-order logic used later. Of these chapters, the first is relatively superficial, but the second and third are very detailed and, more importantly, very clean I would guess Carpenter is an excellent teacher: he presents difficult material in an appealing way, and doesn't go overboard on terse notation or Greek letters. Chapter 4 introduces various flavors of categorial grammar, and Chapter 5 describes the Lambek calculus. By this time we are 180 pages or so into the book, and I was beginning to realize that I didn't (even) know as much as I thought I did about categorial grammar. Semantic analysis of English starts off with coordination and unbounded dependencies (Chapter 6). Carpenter provides his own analysis of these phenomena, incorporating various devices from the categorial literature. His reference to earlier work, here and throughout the book, is always very good, with short summaries of approaches to particular issues by other scholars, showing how they differ, or can be seen as different ways of doing the same thing. Chapter 7 is a long discussion of quantifiers and scope problems. Carpenter starts with Montague's "quantifying in" and Cooper's storage approach to scoping, and then argues that the type-logical treatment he develops captures some of the good features of these approaches, while avoiding most of their descriptive and theoretical problems. He then goes on to a more general discussion of NP meanings, covering definites and indefinites, generics, and possessives, followed by a somewhat tangential but interesting approach to comparatives, and finally discussing existential sentences. Chapter 8 provides an analysis of plural NPs, again giving some very useful summaries of earlier approaches, and gradually developing a semantics that can cope
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.