Findings of the Association for Computational Linguistics: EMNLP 2021 2021
DOI: 10.18653/v1/2021.findings-emnlp.359
|View full text |Cite
|
Sign up to set email alerts
|

The Topic Confusion Task: A Novel Evaluation Scenario for Authorship Attribution

Abstract: Authorship attribution is the problem of identifying the most plausible author of an anonymous text from a set of candidate authors. Researchers have investigated same-topic and cross-topic scenarios of authorship attribution, which differ according to whether new, unseen topics are used in the testing phase. However, neither scenario allows us to explain whether errors are caused by a failure to capture authorship writing style or by a topic shift. Motivated by this, we propose the topic confusion task where … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
1

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(10 citation statements)
references
References 32 publications
0
9
1
Order By: Relevance
“…Recent work indicates that traditional methods still outperform pretrained language models (i.e. BERT) (Kestemont et al, 2021;Altakrori et al, 2021;Murauer and Specht, 2021;Tyo et al, 2021;Peng et al, 2021;Futrzynski, 2021), but we show that this narrative only appears to apply to datasets with a limited number of words per class. Furthermore, BERT-based models achieve new state-ofthe-art macro-accuracy on the IMDb62 (98.80%) and Blogs50 (74.95%) datasets and set the benchmark on our newly introduced Gutenberg dataset.…”
Section: Introductioncontrasting
confidence: 54%
See 3 more Smart Citations
“…Recent work indicates that traditional methods still outperform pretrained language models (i.e. BERT) (Kestemont et al, 2021;Altakrori et al, 2021;Murauer and Specht, 2021;Tyo et al, 2021;Peng et al, 2021;Futrzynski, 2021), but we show that this narrative only appears to apply to datasets with a limited number of words per class. Furthermore, BERT-based models achieve new state-ofthe-art macro-accuracy on the IMDb62 (98.80%) and Blogs50 (74.95%) datasets and set the benchmark on our newly introduced Gutenberg dataset.…”
Section: Introductioncontrasting
confidence: 54%
“…One of the difficulties in comparing prior work is the use of different performance metrics. Some examples are accuracy (Altakrori et al, 2021;Stamatatos, 2018;Jafariakinabad and Hua, 2022;Fabien et al, 2020;Saedi and Dras, 2021;Zhang et al, 2018;Barlas and Stamatatos, 2020), F1 (Murauer and Specht, 2021), C@1 (Bagnall, 2015), recall (Lagutina, 2021), precision (Lagutina, 2021), macro-accuracy (Bischoff et al, 2020), AUC (Bagnall, 2015;Pratanwanich and Lio, 2014), R@8 (Rivera-Soto et al, 2021), and the unweighted average of F1, F0.5u, C@1, and AUC (Manolache et al, 2021;Kestemont et al, 2021;Tyo et al, 2021;Futrzynski, 2021;Peng et al, 2021;Bönninghoff et al, 2021;Boenninghoff et al, 2020;Embarcadero-Ruiz et al, 2022;Weerasinghe et al, 2021).…”
Section: Metricsmentioning
confidence: 99%
See 2 more Smart Citations
“…• imbalance (imb): i.e., the standard deviation of the number of documents per author; • topic confusion (as detailed in [6]).…”
Section: Datasetsmentioning
confidence: 99%