Selection of text feature item is a basic and important matter for text mining and information retrieval. Traditional methods of feature extraction require handcrafted features. To hand-design, an effective feature is a lengthy process, but aiming at new applications, deep learning enables to acquire new effective feature representation from training data. As a new feature extraction method, deep learning has made achievements in text mining. The major difference between deep learning and conventional methods is that deep learning automatically learns features from big data, instead of adopting handcrafted features, which mainly depends on priori knowledge of designers and is highly impossible to take the advantage of big data. Deep learning can automatically learn feature representation from big data, including millions of parameters. This thesis outlines the common methods used in text feature extraction first, and then expands frequently used deep learning methods in text feature extraction and its applications, and forecasts the application of deep learning in feature extraction.
According to students' employment problem, employment data mining model of university graduates is presented. The decision tree is very effective means for classification, which is proposed according to the characteristics of employment data and C4.5 algorithm. The C4.5 algorithm is improved from ID3 algorithm that is the core algorithm in the decision tree. The C4.5 algorithm is suitable for its simple construction, high processing speed and easy implementation. The model includes preprocess of the data of employment selection of decision attributes, implementation of mining algorithm, and obtainment of rules from the decision tree. The rules point out which decision attributes decide the classification of employers. Case study shows that the decision tree algorithm applied to employment information data mining, can classify data of employment correctly with simple structure and faster speed, and find some valuable results for analysis and decision. so the proposed algorithm in this paper is effective.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.