Based on recent advances in natural language modeling and those in text generation capabilities, we propose a novel data augmentation method for text classification tasks. We use a powerful pre-trained neural network model to artificially synthesize new labeled data for supervised learning. We mainly focus on cases with scarce labeled data. Our method, referred to as language-model-based data augmentation (LAMBADA), involves fine-tuning a state-of-the-art language generator to a specific task through an initial training phase on the existing (usually small) labeled data. Using the fine-tuned model and given a class label, new sentences for the class are generated. Our process then filters these new sentences by using a classifier trained on the original data. In a series of experiments, we show that LAMBADA improves classifiers' performance on a variety of datasets. Moreover, LAMBADA significantly improves upon the state-of-the-art techniques for data augmentation, specifically those applicable to text classification tasks with little data.
Abstract. In behavioral programming, a program consists of separate modules called behavior threads, each representing a part of the system's allowed, necessary or forbidden behavior. An execution of the program is a series of synchronizations between these threads, where at each synchronization point an event is selected to be carried out. As a result, the execution speed is dictated by the slowest thread. We propose an eager execution mechanism for such programs, which builds upon the realization that it is often possible to predict the outcome of a synchronization point even without waiting for slower threads to synchronize. This allows faster threads to continue running uninterrupted, whereas slower ones catch up at a later time. Consequently, eager execution brings about increased system performance, better support for the modular design of programs, and the ability to distribute programs across several machines. It also allows to apply behavioral programming to a variety of problems that were previously outside its scope. We illustrate the method by concrete examples, implemented in a behavioral programming framework in C ++ .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.