A latent Dirichlet allocation (LDA) model is a common method for mining the interest of microblog users. But the LDA model does not reflect the hierarchical and dynamic trend of microblog users' interest. As a result, this paper combines with the timeliness and interactivity of microblog, to judge the hierarchical orientation and dynamic interest trend orientation of users' interest. And based on the dynamic interest hierarchical orientation, the three-layers interest network (TIN-LDA) model is constructed to mine the interest of microblog users. In addition, this model expands interest attributes. Interest attributes include contents, contents marked with special symbols, forwarding contents, along with the authentication user name and authentication information. Bringing the interest attributes into users' interest analysis so as to improve the accuracy of mining microblog users' interest keywords and topics. Topic quality assessment and perplexity evaluation were used to verify the effectiveness of the TIN-LDA model in mining the interest of microblog users. INDEX TERMS Dynamic interest hierarchical orientation, LDA topic model, interest topics and keywords, TIN-LDA model, interest attributes.
Protein-protein interaction sites are the basis of biomolecule interactions, which are widely used in drug target identification and new drug discovery. Traditional site predictors of protein-protein interaction mostly based on unbalanced datasets, the classification results tend to negative class, resulting in a lower predictive accuracy for positive class. A method called RBFIS (radial basis function improved by SMOTE) is presented in the paper to address the problem. The intelligent algorithm SMOTE is used to artificially synthesize the imbalanced datasets of negative sample classes. Simultaneously, KNN algorithm is utilized to interpolate values between the minority class samples to generate new samples, making the sample data tend to balance as much as possible. Then, RBF classifier is used to construct the site predictor of protein-protein interaction based on the processed quasi-equilibrium sample sets. The results of experiments indicated that the method had an improvement on recall and f -measure of positive class compared with traditional methods by 12% and 25%. Moreover, many rounds of experiments were performed for different combinations of features. It was observed that the key combination of different multiple features can better efficiently improve the prediction performance.In conclusion, the studies we have performed show that the proposed method is better for dealing with the imbalanced protein interaction sites.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.