2022
DOI: 10.1609/icwsm.v16i1.19304
|View full text |Cite
|
Sign up to set email alerts
|

Are You Robert or RoBERTa? Deceiving Online Authorship Attribution Models Using Neural Text Generators

Abstract: Recently, there has been a rise in the development of powerful pre-trained natural language models, including GPT-2, Grover, and XLM. These models have shown state-of-the-art capabilities towards a variety of different NLP tasks, including question answering, content summarisation, and text generation. Alongside this, there have been many studies focused on online authorship attribution (AA). That is, the use of trained models to identify the authors of online texts. Given the power of natural language models … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 21 publications
0
1
0
Order By: Relevance
“…Additionally, the focus on Twitter data may limit the generalizability of the findings to other social media platforms, as different platforms may exhibit distinct communication patterns and content structures. To address those limitations, one could consider expanding the training dataset to include longer-form content, allowing the classifier to adapt to diverse text lengths and potentially improving prediction accuracy across various content types [95] . Additionally, to enhance the generalizability of the findings, incorporating data from multiple social media platforms and adjusting the model to account for distinct communication patterns and content structures inherent to each platform would provide a more comprehensive understanding of health-related discussions in the broader digital landscape [96] .…”
Section: Discussionmentioning
confidence: 99%
“…Additionally, the focus on Twitter data may limit the generalizability of the findings to other social media platforms, as different platforms may exhibit distinct communication patterns and content structures. To address those limitations, one could consider expanding the training dataset to include longer-form content, allowing the classifier to adapt to diverse text lengths and potentially improving prediction accuracy across various content types [95] . Additionally, to enhance the generalizability of the findings, incorporating data from multiple social media platforms and adjusting the model to account for distinct communication patterns and content structures inherent to each platform would provide a more comprehensive understanding of health-related discussions in the broader digital landscape [96] .…”
Section: Discussionmentioning
confidence: 99%
“…While older AA methods focused on human authors, more recent efforts (Uchendu et al, 2020;Munir et al, 2021) build models to identify the generator for a particular input text. Recent work also shows how AI-generated text can deceive state-of-the-art AA models (Jones et al, 2022), thus making the task of detecting such text even more important.…”
Section: Related Workmentioning
confidence: 99%
“…Authorship Attribution and Cybercrime: Numerous authorship attribution studies have been successfully applied to the fields of forensic (Yang and Chow, 2014;Johansson and Isbister, 2019;Belvisi et al, 2020) and cybercrime investigations (Zheng et al, 2003;Rashid et al, 2013), spam de-tection (Alazab et al, 2013;Jones et al, 2022), and linking vendor accounts on darknet markets (Ekambaranathan, 2018;Tai et al, 2019;Manolache et al, 2022;Saxena et al, 2023). However, to our knowledge, none of the existing studies focus on connecting vendors of HT through escort ads.…”
Section: Authorship Attribution In Nlpmentioning
confidence: 99%