Detecting cybersecurity intelligence (CSI) on social media such as Twitter is crucial because it allows security experts to respond cyber threats in advance. In this paper, we devise a new text classification model based on deep learning to classify CSI-positive and -negative tweets from a collection of tweets. For this, we propose a novel word embedding model, called contrastive word embedding, that enables to maximize the difference between base embedding models. First, we define CSI-positive and -negative corpora, which are used for constructing embedding models. Here, to supplement the imbalance of tweet data sets, we additionally employ the background knowledge for each tweet corpus: (1) CVE data set for CSI-positive corpus and (2) Wikitext data set for CSI-negative corpus. Second, we adopt the deep learning models such as CNN or LSTM to extract adequate feature vectors from the embedding models and integrate the feature vectors into one classifier. To validate the effectiveness of the proposed model, we compare our method with two baseline classification models: (1) a model based on a single embedding model constructed with CSI-positive corpus only and (2) another model with CSI-negative corpus only. As a result, we indicate that the proposed model shows high accuracy, i.e., 0.934 of F1-score and 0.935 of area under the curve (AUC), which improves the baseline models by 1.76∼6.74% of F1-score and by 1.64∼6.98% of AUC.
In this paper, we deal with the problem of judging the credibility of movie reviews. The problem is challenging because even experts cannot clearly and efficiently judge the credibility of a movie review and the number of movie reviews is very large. To attack this problem, we propose a weakly supervised learning method for fast annotation. In terms of predefined criteria for weakly supervised learning, we present a simple and clear criterion based on historical movie ratings associated with movie reviewers. The proposed method has the following two advantages. First, it is significantly efficient because we can annotate the entire data sets according to the predefined rule. Indeed, we show that the proposed method can annotate 8,000 movie reviews only in 0.712 seconds. Second, a criterion adapted for weakly supervised learning is simple but effective. We use as a comparison learning method that uses the helpfulness votes of other reviewers as the criterion to judge the credibility of movie reviews, which has been widely used to judge the credibility of online reviews.We indicate that the proposed learning method is comparable to or even better than the helpfulness vote method by showing an improvement over the accuracy of the latter method of 1.57% ∼ 4.54%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.