This paper proposes a model for Chinese text classification based on a feature-enhanced nonequilibrium bidirectional long short-term memory (Bi-LSTM) network that analyzes Chinese text information in depth and improves the accuracy of text classification. First, the bidirectional encoder representations from transformers model was used to vectorize the original Chinese corpus and extract preliminary semantic features. Then, a nonequilibrium Bi-LSTM network was applied to increase the weight of text information containing important semantics and further improve the effects of the key features in Chinese text classification. Simultaneously, a hierarchical attention mechanism was used to widen the gap between the important and unimportant data. Finally, the softmax function was used for classification. By comparing the classification performance of the proposed scheme with those of various other models, it was observed that the model substantially improved the precision of Chinese text classification and had a strong ability to recognize Chinese text features. The model achieved 97% precision on the experimental dataset.
To solve the problem regarding unbalanced distribution of multi-category Chinese long texts and improve the classification accuracy thereof, a data enhancement method was proposed. Combined with this method, a feature-enhanced text-inception model for Chinese long text classification was proposed. First, the model used a novel text-inception module to extract important shallow features of the text. Meanwhile, the bidirectional gated recurrent unit (Bi-GRU) and the capsule neural network were employed to form a deep feature extraction module to understand the semantic information in the text; K-MaxPooling was then used to reduce the dimension of its shallow and deep features and enhance the overall features. Finally, the Softmax function was used for classification. By comparing the classification effects with a variety of models, the results show that the model can significantly improve the accuracy of long Chinese text classification and has a strong ability to recognize long Chinese text features. The accuracy of the model is 93.97% when applied to an experimental dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.