2023
DOI: 10.48550/arxiv.2301.12500
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

BERT-based Authorship Attribution on the Romanian Dataset called ROST

Abstract: Being around for decades, the problem of Authorship Attribution is still very much in focus currently. Some of the more recent instruments used are the pre-trained language models, the most prevalent being BERT.Here we used such a model to detect the authorship of texts written in the Romanian language. The dataset used is highly unbalanced, i.e., significant differences in the number of texts per author, the sources from which the texts were collected, the time period in which the authors lived and wrote thes… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 17 publications
0
2
0
Order By: Relevance
“…Focusing on datasets with 10 authors (see Table 12), we observe that our RoBERT model outperformed the existing approaches [22,51] for both FT and PP corpora. Additionally, our hybrid RoBERT model, which incorporates RBI features, achieved the highest F1 score of 0.95 for the PP corpus, indicating the effectiveness of leveraging both textual and numerical features for AA tasks.…”
Section: Comparison With Existing Methodsmentioning
confidence: 89%
See 1 more Smart Citation
“…Focusing on datasets with 10 authors (see Table 12), we observe that our RoBERT model outperformed the existing approaches [22,51] for both FT and PP corpora. Additionally, our hybrid RoBERT model, which incorporates RBI features, achieved the highest F1 score of 0.95 for the PP corpus, indicating the effectiveness of leveraging both textual and numerical features for AA tasks.…”
Section: Comparison With Existing Methodsmentioning
confidence: 89%
“…A second study by Avram [51] focused on authorship attribution using pre-trained language models, particularly BERT, to detect the authorship of Romanian texts. Similar to the previous study, this research used the same dataset, which is highly unbalanced in terms of the number of texts per author, source, time period, and writing type.…”
Section: Authorship Attribution In Romanianmentioning
confidence: 99%