Oversampling technology has been widely used to improve the classification task of unbalanced data. However, unlike structured data, the basic unit of text is words or characters, which can cause oversampling instances in digital space to lose word similarity in semantic space. To solve this problem, use text rewriting to directly generate artificial samples. Unfortunately, existing rewriting techniques usually destroy the grammatical structure and logic of the original text. In this article, we improve and limit some existing text rewriting methods, and propose an effective algorithm to mine feature words in various texts to help complete text rewriting. At the same time, by calculating the similarity between texts, various types of data are divided into key data and non-key data, and finally different rewriting processes are designed for them. The experimental results of four unbalanced text classification tasks show that our method is superior to the previous text rewriting method, which can improve the classification accuracy of the model by 1.7% to 2.9%, and the AUC can be increased by 0.012 to 0.058. The ablation experiment also explored the effects of various variables and methods on the experimental results. K E Y W O R D S Oversampling, Unbalanced data, Text rewriting, Mining feature words, Key data 1 INTRODUCTION 1.1 Background Text classification is one of the most basic tasks in natural language processing (NLP). With pre-trained word vectors, 1,2 Pay attention to the mechanism, 3 In the past decade, with the development of other technologies, the accuracy of classification has been improved to a higher level by many novel NLP networks. 4-6 Given that most of the previous literature is based on the assumption that the number of each category in the target data is balanced. The high performance of the classifier usually depends on the size and quality of the training data. On the contrary, the data distribution in the actual scene 7 Tend to be skewed. Because the features are not obvious enough, some samples of a small number of categories (called minority categories) can be easily classified as the categories with the largest amount of data (called majority categories), which leads to the problem of imbalance in data classification. Currently, there are many studies 8,9 Solve the problem of class imbalance. The most common strategy involves re-sampling the original data, which aims to mitigate the effects of unbalanced data by changing the spatial distribution of the samples. This technique can be divided into oversampling and undersampling. On the one hand, random copy, 10 A common and simple oversampling method is usually used to handle a small number of samples. However, this operation does not add other information, such as words, phrases, or sentences. It simply copies the original text randomly