The Topic Confusion Task: A Novel Evaluation Scenario for Authorship Attribution

Altakrori, Malik H.; Cheung, Jackie Chi Kit; Fung, Benjamin C. M.

doi:10.18653/v1/2021.findings-emnlp.359

Cited by 6 publications

(10 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recent work indicates that traditional methods still outperform pretrained language models (i.e. BERT) (Kestemont et al, 2021;Altakrori et al, 2021;Murauer and Specht, 2021;Tyo et al, 2021;Peng et al, 2021;Futrzynski, 2021), but we show that this narrative only appears to apply to datasets with a limited number of words per class. Furthermore, BERT-based models achieve new state-ofthe-art macro-accuracy on the IMDb62 (98.80%) and Blogs50 (74.95%) datasets and set the benchmark on our newly introduced Gutenberg dataset.…”

Section: Introductioncontrasting

confidence: 54%

“…One of the difficulties in comparing prior work is the use of different performance metrics. Some examples are accuracy (Altakrori et al, 2021;Stamatatos, 2018;Jafariakinabad and Hua, 2022;Fabien et al, 2020;Saedi and Dras, 2021;Zhang et al, 2018;Barlas and Stamatatos, 2020), F1 (Murauer and Specht, 2021), C@1 (Bagnall, 2015), recall (Lagutina, 2021), precision (Lagutina, 2021), macro-accuracy (Bischoff et al, 2020), AUC (Bagnall, 2015;Pratanwanich and Lio, 2014), R@8 (Rivera-Soto et al, 2021), and the unweighted average of F1, F0.5u, C@1, and AUC (Manolache et al, 2021;Kestemont et al, 2021;Tyo et al, 2021;Futrzynski, 2021;Peng et al, 2021;Bönninghoff et al, 2021;Boenninghoff et al, 2020;Embarcadero-Ruiz et al, 2022;Weerasinghe et al, 2021).…”

Section: Metricsmentioning

confidence: 99%

“…While both problems have received considerable attention (Murauer and Specht, 2021;Altakrori et al, 2021;Kestemont et al, 2021), the state of the art is difficult to assess owing to inconsistencies in the datasets, splits, performance metrics, and variations in the framing of domain shift across studies. For example, a recent survey paper (Neal et al, 2017) indicates that the state-of-the-art method is based on the Prediction by Partial Matching (PPM) text compression scheme and the cross-entropy of each text with respect to the PPM categories.…”

Section: Introductionmentioning

confidence: 99%

“…Recent work (Fabien et al, 2020) concludes that the transformer-based language model BERT is the highest-performing AA method. A recent analysis paper (Altakrori et al, 2021) argue that the traditional approach of character n-grams and masking remains the best methodology to this day. Each of these sources compares methods against different baselines, on different datasets (sometimes on just a single small dataset), and with different problem variations (such as cross-topic, cross-genre, etc.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Valla: Standardizing and Benchmarking Authorship Attribution and Verification Through Empirical Evaluation and Comparative Analysis

Tyo,

Dhingra,

Lipton

2023

Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacifi

View full text Add to dashboard Cite

Despite decades of research on authorship attribution (AA) and authorship verification (AV), inconsistent dataset splits/filtering and mismatched evaluation methods make it difficult to assess the state of the art. In this paper, we present a survey of the fields, resolve points of confusion, introduce VALLA that standardizes and benchmarks AA/AV datasets and metrics, provide a large-scale empirical evaluation, and provide apples-to-apples comparisons between existing methods. We evaluate eight promising methods on fifteen datasets (including distribution shifted challenge sets) and introduce a new dataset based on texts archived by Project Gutenberg. Surprisingly, we find that a traditional Ngram-based model performs best on 5 (of 7) AA tasks, achieving an average macro-accuracy of 76.50% (compared to 66.71% for a BERT-based model). However, on the two AA datasets with the greatest number of words per author, as well as on the AV datasets, BERT-based models perform best. While AV methods are easily applied to AA, they are seldom included as baselines in AA papers. We show that through the application of hard-negative mining, AV methods are competitive alternatives to AA methods. VALLA and all experiment code can be found here: https://github.com/JacobTyo/Valla

show abstract

Section: Introductioncontrasting

confidence: 54%

Section: Metricsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Valla: Standardizing and Benchmarking Authorship Attribution and Verification Through Empirical Evaluation and Comparative Analysis

Tyo,

Dhingra,

Lipton

2023

Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacifi

View full text Add to dashboard Cite

show abstract

“…• imbalance (imb): i.e., the standard deviation of the number of documents per author; • topic confusion (as detailed in [6]).…”

Section: Datasetsmentioning

confidence: 99%

BERT-based Authorship Attribution on the Romanian Dataset called ROST

Sanda-Maria¹

2023

Preprint

View full text Add to dashboard Cite

Being around for decades, the problem of Authorship Attribution is still very much in focus currently. Some of the more recent instruments used are the pre-trained language models, the most prevalent being BERT.Here we used such a model to detect the authorship of texts written in the Romanian language. The dataset used is highly unbalanced, i.e., significant differences in the number of texts per author, the sources from which the texts were collected, the time period in which the authors lived and wrote these texts, the medium intended to be read (i.e., paper or online), and the type of writing (i.e., stories, short stories, fairy tales, novels, literary articles, and sketches). The results are better than expected, sometimes exceeding 87% macro-accuracy.

show abstract