There has been an explosion in the amount of digital text information available in recent years, leading to challenges of scale for traditional inference algorithms for topic models. Recent advances in stochastic variational inference algorithms for latent Dirichlet allocation (LDA) have made it feasible to learn topic models on very large-scale corpora, but these methods do not currently take full advantage of the collapsed representation of the model. We propose a stochastic algorithm for collapsed variational Bayesian inference for LDA, which is simpler and more efficient than the state of the art method. In experiments on large-scale text corpora, the algorithm was found to converge faster and often to a better solution than previous methods. Humansubject experiments also demonstrated that the method can learn coherent topics in seconds on small corpora, facilitating the use of topic models in interactive document analysis software.
Interaction within small groups can often be represented as a sequence of events, where each event involves a sender and a recipient. Recent methods for modeling network data in continuous time model the rate at which individuals interact conditioned on the previous history of events as well as actor covariates. We present a hierarchical extension for modeling multiple such sequences, facilitating inferences about event-level dynamics and their variation across sequences. The hierarchical approach allows one to share information across sequences in a principled manner-we illustrate the efficacy of such sharing through a set of prediction experiments. After discussing methods for adequacy checking and model selection for this class of models, the method is illustrated with an analysis of high school classroom dynamics.
Task-agnostic forms of data augmentation have proven widely effective in computer vision, even on pretrained models. In NLP similar results are reported most commonly for low data regimes, non-pretrained models, or situationally for pretrained models. In this paper we ask how effective these techniques really are when applied to pretrained transformers. Using two popular varieties of task-agnostic data augmentation (not tailored to any particular task), Easy Data Augmentation (Wei and Zou, 2019) and Back-Translation (Sennrich et al., 2015), we conduct a systematic examination of their effects across 5 classification tasks, 6 datasets, and 3 variants of modern pretrained transformers, including BERT, XL-NET, and ROBERTA. We observe a negative result, finding that techniques which previously reported strong improvements for nonpretrained models fail to consistently improve performance for pretrained transformers, even when training data is limited. We hope this empirical analysis helps inform practitioners where data augmentation techniques may confer improvements.
Recombinant adeno-associated viruses (AAVs) have emerged as promising vectors for human gene therapy, but some variants have induced severe toxicity in Rhesus monkeys and piglets following high-dose intravenous (IV) administration. To characterize biodistribution, transduction, and toxicity among common preclinical species, an AAV9 neurotropic variant expressing the survival motor neuron 1 ( SMN1 ) transgene (AAV-PHP.B-CBh- SMN1 ) was administered by IV bolus injection to Wistar Han rats and cynomolgus monkeys at doses of 2 × 10 13 , 5 × 10 13 , or 1 × 10 14 vg/kg. A dose-dependent degeneration/necrosis of neurons without clinical manifestations occurred in dorsal root ganglia (DRGs) and sympathetic thoracic ganglia in rats, while liver injury was not observed in rats. In monkeys, one male at 5 × 10 13 vg/kg was found dead on day 4. Clinical pathology data on days 3 and/or 4 at all doses suggested liver dysfunction and coagulation disorders, which led to study termination. Histologic evaluation of the liver in monkeys showed hepatocyte degeneration and necrosis without inflammatory cell infiltrates or intravascular thrombi, suggesting that hepatocyte injury is a direct effect of the vector following hepatocyte transduction. In situ hybridization demonstrated a dose-dependent expression of SMN1 transgene mRNA in the cytoplasm and DNA in the nucleus of periportal to panlobular hepatocytes, while quantitative polymerase chain reaction confirmed the dose-dependent presence of SMN1 transgene mRNA and DNA in monkeys. Monkeys produced a much greater amount of transgene mRNA compared with rats. In DRGs, neuronal degeneration/necrosis and accompanying findings were observed in monkeys as early as 4 days after test article administration. The present results show sensory neuron toxicity following IV delivery of AAV vectors at high doses with an early onset in Macaca fascicularis and after 1 month in rats, and suggest adding the autonomic system in the watch list for preclinical and clinical studies. Our data also suggest that the rat may be useful for evaluating the potential DRG toxicity of AAV vectors, while acute hepatic toxicity associated with coagulation disorders appears to be highly species-dependent.
Many social networks can be characterized by a sequence of dyadic interactions between individuals. Techniques for analyzing such events are of increasing interest. In this paper, we describe a generative model for dyadic events, where each event arises from one of C latent classes, and the properties of the event (sender, recipient, and type) are chosen from distributions over these entities conditioned on the chosen class. We present two algorithms for inference in this model: an expectation-maximization algorithm as well as a Markov chain Monte Carlo procedure based on collapsed Gibbs sampling. To analyze the model's predictive accuracy, the algorithms are applied to multiple real-world data sets involving email communication, international political events, and animal behavior data.
We consider the problem of analyzing social network data sets in which the edges of the network have timestamps, and we wish to analyze the subgraphs formed from edges in contiguous subintervals of these timestamps. We provide data structures for these problems that use nearlinear preprocessing time, linear space, and sublogarithmic query time to handle queries that ask for the number of connected components, number of components that contain cycles, number of vertices whose degree equals or is at most some predetermined value, number of vertices that can be reached from a starting set of vertices by time-increasing paths, and related queries.
LAR is related to the extent of CAD in pre-diabetic patients but not in normoglycaemic patients. This finding might in part explain the poorer outcome in revascularized patients with impaired glucose tolerance compared to normoglycaemic patients.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.