2016
DOI: 10.1007/978-3-319-47217-1_9
|View full text |Cite
|
Sign up to set email alerts
|

Comparison of Cross-Validation and Test Sets Approaches to Evaluation of Classifiers in Authorship Attribution Domain

Abstract: Abstract. The presented paper addresses problem of evaluation of decision systems in authorship attribution domain. Two typical approaches are cross-validation and evaluation based on specially created test datasets. Sometimes preparation of test sets can be troublesome. Another problem appears when discretization of input sets is taken into account. It is not obvious how to discretize test datasets. Therefore model evaluation method not requiring test sets would be useful. Cross-validation is the well-known a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

1
2
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 17 publications
(3 citation statements)
references
References 10 publications
1
2
0
Order By: Relevance
“…By using the precompiled Songkorpus, we empirically tested the accuracy of different authorship classificators on a reliable dataset. The results of our best model seems promising and are in accordance with comparable research reports on naïve bayes classifiers (Rish, 2001;Dai et al, 2007;Labatut & Cherifi, 2012;Nitze, Schulthess & Asche, 2012;Altheneyan & Menai, 2014;Baron, 2016;Shih, Stow, & Tsai, 2019). It can be concluded from our experiments that the Naive Bayes classifier seems to be a good choice for authorship attribution of song lyrics, at least for the investigated singer-songwriter dataset.…”
Section: Discussionsupporting
confidence: 90%
“…By using the precompiled Songkorpus, we empirically tested the accuracy of different authorship classificators on a reliable dataset. The results of our best model seems promising and are in accordance with comparable research reports on naïve bayes classifiers (Rish, 2001;Dai et al, 2007;Labatut & Cherifi, 2012;Nitze, Schulthess & Asche, 2012;Altheneyan & Menai, 2014;Baron, 2016;Shih, Stow, & Tsai, 2019). It can be concluded from our experiments that the Naive Bayes classifier seems to be a good choice for authorship attribution of song lyrics, at least for the investigated singer-songwriter dataset.…”
Section: Discussionsupporting
confidence: 90%
“…Based on this information, together with the nucleotide density information, nucleotide N at i th position from sequence S (with length lÞ can be represented by the formula Ni = fx i ; y i ; z i ; d i gði = 1; 2; 3; .lÞ which satisfies the following equations: 24,27,35 We evaluated the performance of these algorithms by an independent testing dataset, since the evaluation by cross-validation may over-estimate the performance of models. 39 The R package caret was used to construct machine learning models, and all parameters were set by default for primitive evaluation. The results are shown in Table 1.…”
Section: Nucleotide Chemical Propertymentioning
confidence: 99%
“…In cross-validation, even with several folds, it is highly probable to obtain falsely higher classification accuracy. These overly optimistic results are explained by this close similarity of some groups of examples [55], and lack of statistical independence between tests, as the same samples are used in several evaluations [11].…”
Section: Plos Onementioning
confidence: 99%