We develop an online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA). Online LDA is based on online stochastic optimization with a natural gradient step, which we show converges to a local optimum of the VB objective function. It can handily analyze massive document collections, including those arriving in a stream. We study the performance of online LDA in several ways, including by fitting a 100-topic topic model to 3.3M articles from Wikipedia in a single pass. We demonstrate that online LDA finds topic models as good or better than those found with batch VB, and in a fraction of the time.
We present the task of identifying the emotions conveyed by the lyrics of Italian opera arias. We shape the task as a multi-class supervised problem, considering the six emotions from Parrot's tree: love, joy, admiration, anger, sadness, and fear. We manually annotated an opera corpus with 2.5k instances at the verse level and experimented with different classification models and representations to identify the expressed emotions. Our best-performing models consider character 3-gram representations and reach relatively low levels of macro-averaged F 1 . Such performance reflects the difficulty of the task at hand, partially caused by the size and nature of the corpus: relatively short verses written in 18th-century Italian. Building on what we learned from the verse-level setting, we adopt a higher granularity and increase the size of the corpus. First, we switch from verses to arias in order to have longer and more expressive texts. Second, we construct a new corpus with 40k arias (∼ 90k verses). This new dataset contains silver data, annotated by self-learning on the basis of an ensemble of binary classifiers.We then experiment with more sophisticated representations, by learning an embedding space and using it to train new models for the identification of emotions at the aria level, obtaining a significant performance boost.
We present our submission to SemEval-2023 Task 10: Explainable Detection of Online Sexism (EDOS). We address all three tasks: Task A consists of identifying whether a post is sexist. If so, Task B attempts to assign it one of four classes: threats, derogation, animosity, and prejudiced discussions. Task C aims for an even more fine-grained classification, divided among 11 classes. We experiment with finetuning of hate-tuned Transformer-based models and priming for generative models. In addition, we explore model-agnostic strategies, such as data augmentation techniques combined with active learning, as well as obfuscation of identity terms. Our official submissions obtain an F 1 score of 0.83 for Task A, 0.58 for Task B and 0.32 for Task C.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.