2010
DOI: 10.1093/llc/fqq013
|View full text |Cite
|
Sign up to set email alerts
|

The effect of author set size and data size in authorship attribution

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
66
2

Year Published

2013
2013
2022
2022

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 107 publications
(71 citation statements)
references
References 18 publications
3
66
2
Order By: Relevance
“…found that Document Author Representation(DAR) can be very useful in AA tasks, because it provides good performance on imbalanced data, getting comparable or better accuracy results. [5,6] investigated the class imbalance problem and tests several methods for compensation of imbalanced data sets. He concludes that the best method uses many short text samples for minority classes and less but longer ones for the majority classes.…”
Section: Literature Surveymentioning
confidence: 99%
“…found that Document Author Representation(DAR) can be very useful in AA tasks, because it provides good performance on imbalanced data, getting comparable or better accuracy results. [5,6] investigated the class imbalance problem and tests several methods for compensation of imbalanced data sets. He concludes that the best method uses many short text samples for minority classes and less but longer ones for the majority classes.…”
Section: Literature Surveymentioning
confidence: 99%
“…Among all these stylometric features, the character n-gram model in character-based linguistic modality performs the best, and it is comparatively more robust against the others [Luyckx and Daelemans 2011;Koppel et al 2011]. The character n-gram model actually captures information crossing different modalities [Houvardas and Stamatatos 2006]; for example, a frequent 'ed' bigram in a character-based modality may also carry the frequent usage of past tense in a syntactic modality.…”
Section: Stylometric Featuresmentioning
confidence: 99%
“…On the contrary, short snippets are relatively casual, and their stylometric features have larger variation. As shown in recent research [Koppel et al 2011;Luyckx and Daelemans 2011;Narayanan et al 2012], authorship attribution accuracy is greatly and directly affected by many objective factors (e.g., text length, number of known author samples, etc.) due to the unstructured nature of the text itself.…”
Section: Introductionmentioning
confidence: 99%
“…If such a characteristic was absent in an anonymous text, it did not necessarily argue against a writer's authorship in whose other texts (perhaps in different topics or genres) the characteristic did prominently feature. Apart from the limited scalability of such style (Luyckx, 2010;Luyckx and Daelemans, 2011), a far more troublesome issue is associated with them. Because of their whimsical nature these low-frequency phenomena could have struck an author's imitators or followers as strongly as they could have struck a scholar.…”
Section: Seminal Workmentioning
confidence: 99%