Data augmentation is proven to be effective in many NLU tasks, especially for those suffering from data scarcity. In this paper, we present a powerful and easy to deploy text augmentation framework, Data Boost, which augments data through reinforcement learning guided conditional generation. We evaluate Data Boost on three diverse text classification tasks under five different classifier architectures. The result shows that Data Boost can boost the performance of classifiers especially in low-resource data scenarios. For instance, Data Boost improves F1 for the three tasks by 8.7% on average when given only 10% of the whole data for training. We also compare Data Boost with six prior text augmentation methods. Through human evaluations (N =178), we confirm that Data Boost augmentation has comparable quality as the original data with respect to readability and class consistency.
Point clouds registration is an important step for laser scanner data processing, and there have been numerous methods. However, the existing methods often suffer from low accuracy and low speed when registering large point clouds. To meet this challenge, an improved iterative closest point (ICP) algorithm combining random sample consensus (RANSAC) algorithm, intrinsic shape signatures (ISS), and 3D shape context (3DSC) is proposed. The proposed method firstly uses voxel grid filter for down-sampling. Next, the feature points are extracted by the ISS algorithm and described by the 3DSC. Afterwards, the ISS-3DSC features are used for rough registration with the RANSAC algorithm. Finally, the ICP algorithm is used for accurate registration. The experimental results show that the proposed algorithm has faster registration speed than the compared algorithms, while maintaining high registration accuracy.
Current large-scale language models can be politically biased as a result of the data they are trained on, potentially causing serious problems when they are deployed in real-world settings. In this paper, we describe metrics for measuring political bias in GPT-2 generation and propose a reinforcement learning (RL) framework for mitigating political biases in generated text. By using rewards from word embeddings or a classifier, our RL framework guides debiased generation without having access to the training data or requiring the model to be retrained. In empirical experiments on three attributes sensitive to political bias (gender, location, and topic), our methods reduced bias according to both our metrics and human evaluation, while maintaining readability and semantic coherence.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.