18th International Conference on Database and Expert Systems Applications (DEXA 2007) 2007
DOI: 10.1109/dexa.2007.5
|View full text |Cite
|
Sign up to set email alerts
|

Author Identification Using Imbalanced and Limited Training Texts

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
33
0
2

Year Published

2008
2008
2021
2021

Publication Types

Select...
4
3

Relationship

3
4

Authors

Journals

citations
Cited by 52 publications
(39 citation statements)
references
References 8 publications
1
33
0
2
Order By: Relevance
“…The majority of the authorship attribution approaches studies present experiments based on balanced training sets (i.e., equal amount of training text samples for each candidate author) so it is not possible to estimate their accuracy under class imbalance conditions. Only a few studies take this factor into account (Marton, et al, 2005;Stamatatos, 2007).…”
Section: Cng and Variantsmentioning
confidence: 99%
“…The majority of the authorship attribution approaches studies present experiments based on balanced training sets (i.e., equal amount of training text samples for each candidate author) so it is not possible to estimate their accuracy under class imbalance conditions. Only a few studies take this factor into account (Marton, et al, 2005;Stamatatos, 2007).…”
Section: Cng and Variantsmentioning
confidence: 99%
“…Typically the features are selected based on their frequency of appearance in the profile. Examples of a profile based approach include [10,20,11].…”
Section: Introductionmentioning
confidence: 99%
“…Although the dimensionality of the problem is increased in comparison to a function word approach, it is much smaller in comparison to a word n-gram approach. Methods based on such features have produced very good results in several author identification experiments and texts in various languages [17,16,30,11]. However, there is still no consensus about the definition of an appropriate n value (the length of character n-grams) for certain natural languages and text types.…”
Section: Previous Workmentioning
confidence: 99%
“…Another way to represent text is by using character n-gram frequencies [17,30]. Again the most frequent character n-grams (n contiguous characters) include the most important information.…”
Section: Previous Workmentioning
confidence: 99%
See 1 more Smart Citation