Hansub Shin scite author profile

Detecting cybersecurity intelligence (CSI) on social media such as Twitter is crucial because it allows security experts to respond cyber threats in advance. In this paper, we devise a new text classification model based on deep learning to classify CSI-positive and -negative tweets from a collection of tweets. For this, we propose a novel word embedding model, called contrastive word embedding, that enables to maximize the difference between base embedding models. First, we define CSI-positive and -negative corpora, which are used for constructing embedding models. Here, to supplement the imbalance of tweet data sets, we additionally employ the background knowledge for each tweet corpus: (1) CVE data set for CSI-positive corpus and (2) Wikitext data set for CSI-negative corpus. Second, we adopt the deep learning models such as CNN or LSTM to extract adequate feature vectors from the embedding models and integrate the feature vectors into one classifier. To validate the effectiveness of the proposed model, we compare our method with two baseline classification models: (1) a model based on a single embedding model constructed with CSI-positive corpus only and (2) another model with CSI-negative corpus only. As a result, we indicate that the proposed model shows high accuracy, i.e., 0.934 of F1-score and 0.935 of area under the curve (AUC), which improves the baseline models by 1.76∼6.74% of F1-score and by 1.64∼6.98% of AUC.

show abstract

A new smart smudge attack using CNN

Shin

Sim

Kwon

et al. 2021

Int. J. Inf. Secur.

View full text Add to dashboard Cite

A comparative experimental study of distributed storage engines for big spatial data processing using GeoSpark

2021

View full text Add to dashboard Cite

Historical credibility for movie reviews and its application to weakly supervised classification

Kim

Lim

Shin

et al. 2023

Information Sciences

View full text Add to dashboard Cite

Performance Evaluation of Spatial Data Management Systems Using GeoSpark

Shin

Lee

Kwon

2020

View full text Add to dashboard Cite

Historical Credibility for Movie Reviews and Its Application to Weakly Supervised Classification

Kim¹,

Lim²,

Shin³

et al. 2020

Preprint

View full text Add to dashboard Cite

In this paper, we deal with the problem of judging the credibility of movie reviews. The problem is challenging because even experts cannot clearly and efficiently judge the credibility of a movie review and the number of movie reviews is very large. To attack this problem, we propose a weakly supervised learning method for fast annotation. In terms of predefined criteria for weakly supervised learning, we present a simple and clear criterion based on historical movie ratings associated with movie reviewers. The proposed method has the following two advantages. First, it is significantly efficient because we can annotate the entire data sets according to the predefined rule. Indeed, we show that the proposed method can annotate 8,000 movie reviews only in 0.712 seconds. Second, a criterion adapted for weakly supervised learning is simple but effective. We use as a comparison learning method that uses the helpfulness votes of other reviewers as the criterion to judge the credibility of movie reviews, which has been widely used to judge the credibility of online reviews.We indicate that the proposed learning method is comparable to or even better than the helpfulness vote method by showing an improvement over the accuracy of the latter method of 1.57% ∼ 4.54%.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hansub Shin

A New Text Classification Model Based on Contrastive Word Embedding for Detecting Cybersecurity Intelligence in Twitter

A new smart smudge attack using CNN

A comparative experimental study of distributed storage engines for big spatial data processing using GeoSpark

Historical credibility for movie reviews and its application to weakly supervised classification

Performance Evaluation of Spatial Data Management Systems Using GeoSpark

Historical Credibility for Movie Reviews and Its Application to Weakly Supervised Classification

Contact Info

Product

Resources

About