Proceedings of the Sixth Workshop on Noisy User-Generated Text (W-Nut 2020) 2020
DOI: 10.18653/v1/2020.wnut-1.15
|View full text |Cite
|
Sign up to set email alerts
|

Detecting Trending Terms in Cybersecurity Forum Discussions

Abstract: We present a lightweight method for identifying currently trending terms in relation to a known prior of terms, using a weighted logodds ratio with an informative prior. We apply this method to a dataset of posts from an English-language underground hacking forum, spanning over ten years of activity, with posts containing misspellings, orthographic variation, acronyms, and slang. Our statistical approach supports analysis of linguistic change and discussion topics over time, without a requirement to train a to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
11
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 14 publications
(12 citation statements)
references
References 26 publications
0
11
0
Order By: Relevance
“…There is significant interest surrounding the goal of being able to automate cybersecurity threat detection on social media [19,18,15,11,12,27,14,2]. Twitter, Reddit, and Stackexchange are popular forums from which several previous studies have gathered cybersecurity related documents [19,11,14,15,2,20] for the purpose of training machine learning detection systems and classifiers.…”
Section: Previous Workmentioning
confidence: 99%
See 1 more Smart Citation
“…There is significant interest surrounding the goal of being able to automate cybersecurity threat detection on social media [19,18,15,11,12,27,14,2]. Twitter, Reddit, and Stackexchange are popular forums from which several previous studies have gathered cybersecurity related documents [19,11,14,15,2,20] for the purpose of training machine learning detection systems and classifiers.…”
Section: Previous Workmentioning
confidence: 99%
“…Identifying cybersecurity discussions in open forums at scale is a topic of great interest for the purpose of mitigating and understanding modern cyber threats [12,19,22]. The challenge is that these discussions are typically quite noisy (i.e., they contain community known synonyms or acronyms or slang) and it is difficult to get labelled data in order to train resilient NLP (natural language processing) topic classifiers.…”
Section: Introductionmentioning
confidence: 99%
“…There is significant interest surrounding the goal of being able to automate cybersecurity threat detection on social media [18,17,14,10,11,25,13,2]. Twitter, Reddit, and Stackexchange are popular forums from which several previous studies have gathered cybersecurity related documents [18,10,13,14,2,19] for the purpose of training machine learning detection systems and classifiers.…”
Section: Previous Workmentioning
confidence: 99%
“…There are several different approaches taken with which topic modelling task to use as a signal to detect cybersecurity discussions. Typically the topic classification task is related to training directly on labelled text and then perhaps developing an idea of the more relevant keywords in these discussions [18,11]. Other researchers use sentiment analysis in conjunction with other machine learning models [25,10].…”
Section: Previous Workmentioning
confidence: 99%
See 1 more Smart Citation