On Twitter, people often use hashtags to mark the subject of a tweet. Tweets have specific themes or content that are easy for people to manage. With the increase in the number of tweets, how to automatically recommend hashtags for tweets has received wide attention. The previous hashtag recommendation methods were to convert the task into a multi-class classification problem. However, these methods can only recommend hashtags that appeared in historical information, and cannot recommend the new ones. In this work, we extend the self-attention mechanism to turn the hashtag recommendation task into a sequence labeling task. To train and evaluate the proposed method, we used the real tweet data which is collected from Twitter. Experimental results show that the proposed method can be significantly better than the most advanced method. Compared with the state-of-the-art methods, the accuracy of our method has been increased 4%.
In this paper, we present a method that enable both homology-based approach and composition-based approach to further study the functional core (i.e., microbial core and gene core, correspondingly). In the proposed method, the identification of major functionality groups is achieved by generative topic modeling, which is able to extract useful information from unlabeled data. We first show that generative topic model can be used to model the taxon abundance information obtained by homology-based approach and study the microbial core. The model considers each sample as a “document,” which has a mixture of functional groups, while each functional group (also known as a “latent topic”) is a weight mixture of species. Therefore, estimating the generative topic model for taxon abundance data will uncover the distribution over latent functions (latent topic) in each sample. Second, we show that, generative topic model can also be used to study the genome-level composition of “N-mer” features (DNA subreads obtained by composition-based approaches). The model consider each genome as a mixture of latten genetic patterns (latent topics), while each functional pattern is a weighted mixture of the “N-mer” features, thus the existence of core genomes can be indicated by a set of common N-mer features. After studying the mutual information between latent topics and gene regions, we provide an explanation of the functional roles of uncovered latten genetic patterns. The experimental results demonstrate the effectiveness of proposed method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.