2020
DOI: 10.1007/978-3-030-49161-1_22
|View full text |Cite
|
Sign up to set email alerts
|

Cross-Domain Authorship Attribution Using Pre-trained Language Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
40
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 44 publications
(42 citation statements)
references
References 16 publications
2
40
0
Order By: Relevance
“…Author Profile (AP) BERT and RoBERTa (Barlas and Stamatatos, 2020Stamatatos, , 2021. We trained a separate neural language model for each author in the dataset where the embedding layer is initialized with embeddings from BERT and RoBERTa.…”
Section: Pretrained Language Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…Author Profile (AP) BERT and RoBERTa (Barlas and Stamatatos, 2020Stamatatos, , 2021. We trained a separate neural language model for each author in the dataset where the embedding layer is initialized with embeddings from BERT and RoBERTa.…”
Section: Pretrained Language Modelsmentioning
confidence: 99%
“…Since the first computational approach to authorship attribution (Mosteller and Wallace, 1963), researchers have aimed at finding new sets of fea-tures for current domains/languages, adapting existing features to new languages or communication domains, or using new classification techniques, e.g. (Abbasi and Chen, 2006;Stamatatos, 2013;Silva et al, 2011;Layton et al, 2012;Iqbal et al, 2013;Zhang et al, 2018;Altakrori et al, 2018;Barlas and Stamatatos, 2020). Alternatively, motivated by the real-life applications of authorship attribution different elements of and constraints on the attribution process have been investigated (Houvardas and Stamatatos, 2006;Luyckx and Daelemans, 2011;Goldstein-Stewart et al, 2009;Stamatatos, 2013;Wang et al, 2021).…”
Section: Introductionmentioning
confidence: 99%
“…We use a pretrained SBERT model (Reimers and Gurevych, 2019) but update all model parameters during training. Prior work has explored self-attention models for authorship attribution (Saedi and Dras, 2020;Fabien et al, 2020;Barlas and Stamatatos, 2020) with mixed success compared to simpler convolutional models. These systems have utilized either the output or the classification token of BERT as the basis for learning authorship embeddings.…”
Section: Modelmentioning
confidence: 99%
“…On the other hand, if authorship features could be learned in a domainindependent fashion, it would reduce the need for in-domain training sets by exploiting transfer between domains: authorship representations could be learned from a large but out-of-domain corpus and subsequently deployed in a target domain. In prior work, Barlas and Stamatatos (2020) perform a study on cross-domain author verification in a closed world of 21 authors. In contrast, we consider an open-world setting with several orders of magnitude more authors.…”
Section: Introductionmentioning
confidence: 99%
“…Of course, in the real-life scenarios, the authors differ in the topics and genre (e.g., documents, e-mail, tweet, etc. ), but the main challenge is to focus on the author's stylometry [25,26]. Meanwhile, the information from the cross-topic or cross-genre could mislead the model [25], which makes the authorship verification difficult [27].…”
Section: Collecting and Preprocessing The Datamentioning
confidence: 99%