2022
DOI: 10.48550/arxiv.2201.05613
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The Dark Side of the Language: Pre-trained Transformers in the DarkNet

Abstract: Pre-trained Transformers are challenging human performances in many natural language processing tasks. The gigantic datasets used for pre-training seem to be the key for their success on existing tasks. In this paper, we explore how a range of pre-trained natural language understanding models perform on truly novel and unexplored data, provided by classification tasks over a DarkNet corpus. Surprisingly, results show that syntactic and lexical neural networks largely outperform pre-trained Transformers. This s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 14 publications
0
1
0
Order By: Relevance
“…Their work demonstrated that representation methods such as GloVe (Pennington et al, 2014) and contextualized pre-trained language representations such as ELMo (Peters et al, 2018) resulted in a subpar performance compared to traditional machine learning methods, suggesting that the small size of training data and the specialized vocabulary in the Dark Web domain may not be suitable with such methods. Nevertheless, transformer-based pre-trained language models like BERT (Devlin et al, 2019) showed promising results in text classification tasks, although it is not often the case that such models adapt with ease in the Dark Web domain (Ranaldi et al, 2022).…”
Section: Related Workmentioning
confidence: 99%
“…Their work demonstrated that representation methods such as GloVe (Pennington et al, 2014) and contextualized pre-trained language representations such as ELMo (Peters et al, 2018) resulted in a subpar performance compared to traditional machine learning methods, suggesting that the small size of training data and the specialized vocabulary in the Dark Web domain may not be suitable with such methods. Nevertheless, transformer-based pre-trained language models like BERT (Devlin et al, 2019) showed promising results in text classification tasks, although it is not often the case that such models adapt with ease in the Dark Web domain (Ranaldi et al, 2022).…”
Section: Related Workmentioning
confidence: 99%