A3C: Albanian Authorship Attribution Corpus

Misini, Arta; Kadriu, Arbana; Canhasi, Ercan

doi:10.1007/978-3-031-42511-0_49

Cited by 1 publication

(2 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While many studies rely on established benchmark datasets like Enron [20], C50 [7], PAN [22], IMDb62 [6,21] and others [9], the scarcity of standard datasets, particularly for low-resource languages, presents a unique challenge. Creating specialized corpora has paved the way for promising advancements in the field, demonstrated by projects like UNAAC [5], BAAD [2], UrduCorpus [5], A3C Corpus [8,25], and more [4]. These corpora tailored for AA contribute significantly to the field, expanding its resources.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Authorship classification techniques: Bridging textual domains and languages

Misini,

Kadriu,

Canhasi

2024

IJITS JOURNAL

View full text Add to dashboard Cite

Authorship classification analyzes an author's prior work to identify their writing style, a unique trait of each language and individual author. This research aims to conduct a thorough comparative analysis of various methods for classifying authorship. The study leverages two corpora: AAALitCorpus of Albanian literary texts and CCAT10 of English columns. We evaluate model-generated features across different configurations. The richness of the features and the breadth of the analysis provide a significant understanding of the problem, setting a new standard for comprehensive linguistic investigations across multiple languages. The study indicates that machine learning algorithms accurately discern authorial writing styles, highlighting the complexities of classifying authorship in a cross-linguistic context.

show abstract

Section: Related Workmentioning

confidence: 99%

“…The goal was to develop a high-quality dataset for subsequent authorship analysis within Albanian literature. Our previous research, as detailed in [8,25], investigated the utilization of newsroom columns for AA.…”

Section: Aaalitcorpusmentioning

confidence: 99%