As an important carrier for disseminating information in the Internet Age, the text contains a large amount of information. In recent years, adversarial example attacks against text discrete domains have been received widespread attention. Deep neural network (DNN) produces opposite predictions by adding small perturbations to the text data. In this paper, we present ''WordChange'': an adversarial examples generation approach for Chinese text classification based on multiple modification strategies, and we evaluate the effectiveness of the method in sentiment analysis dataset and spam dataset. This method effectively locates important word positions by designing a keyword contribution algorithm. We first propose a ''word-split'' strategy to substitute keywords thatare designed by the structure and semantic property of Chinese texts. We also first apply ''swap'' and ''insert'' strategies on Chinese texts to generate adversarial examples. We further discuss the influence of multiple Chinese Word Segmentation tools and different text lengths on the proposed method, as well as the diversification of Chinese text modification strategies. Finally, the adversarial texts based on the long short-term memory network (LSTM) can be successfully transferred to other text classifiers and real-world applications.
INDEX TERMSAdversarial examples, deep learning, Chinese character modification strategies, black box, sentence filtering.
Text sentiment analysis plays an important role in social network information mining. It is also the theoretical foundation and basis of personalized recommendation, circle of interest classification and public opinion analysis. In view of the existing algorithms for feature extraction and weight calculation, we find that they fail to fully take into account the influence of sentiment words. Therefore, this paper proposed a fine-grained short text sentiment analysis method based on machine learning. To improve the calculation method of feature selection and weighting and proposed a more suitable sentiment analysis algorithm for features extraction named N-CHI and weight calculation named W-TF-IDF, increasing the proportion and weight of sentiment words in the feature words Through experimental analysis and comparison, the classification accuracy of this method is obviously improved compared with other methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.