Multi-label text classification (MLTC) involves tagging a document with its most relevant subset of labels from a label set. In real applications, labels usually follow a long-tailed distribution, where most labels (called as tail-label) only contain a small number of documents and limit the performance of MLTC. To facilitate this low-resource problem, researchers introduced a simple but effective strategy, data augmentation (DA). However, most existing DA approaches struggle in multi-label settings. The main reason is that the augmented documents for one label may inevitably influence the other co-occurring labels and further exaggerate the long-tailed problem. To mitigate this issue, we propose a new pair-level augmentation framework for MLTC, called Label-Specific Feature Augmentation (LSFA), which merely augments positive feature-label pairs for the tail-labels. LSFA contains two main parts. The first is for label-specific document representation learning in the high-level latent space, the second is for augmenting tail-label features in latent space by transferring the documents second-order statistics (intra-class semantic variations) from head labels to tail labels. At last, we design a new loss function for adjusting classifiers based on augmented datasets. The whole learning procedure can be effectively trained. Comprehensive experiments on benchmark datasets have shown that the proposed LSFA outperforms the state-of-the-art counterparts.
Social relations between users have been proven to be a good type of auxiliary information to improve the recommendation performance. However, it is a challenging issue to sufficiently exploit the social relations and correctly determine the user preference from both social and rating information. In this article, we propose a unified Bayesian Additive Matrix Approximation model (BAMA), which takes advantage of rating preference and social network to provide high-quality recommendation. The basic idea of BAMA is to extract social influence from social networks, integrate them to Bayesian additive co-clustering for effectively determining the user clusters and item clusters, and provide an accurate rating prediction. In addition, an efficient algorithm with collapsed Gibbs Sampling is designed to inference the proposed model. A series of experiments were conducted on six real-world social datasets. The results demonstrate the superiority of the proposed BAMA by comparing with the state-of-the-art methods from three views, all users, cold-start users, and users with few social relations. With the aid of social information, furthermore, BAMA has ability to provide the explainable recommendation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.