2021
DOI: 10.1016/j.procs.2021.08.128
|View full text |Cite
|
Sign up to set email alerts
|

Standard vs. non-standard cross-validation: evaluation of performance in a space with structured distribution of datapoints

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 13 publications
0
6
0
Order By: Relevance
“…Standard cross-validation approaches advocate the random selection of samples from a set, repeated a certain number of times, and averaging results over such folds [ 65 ]. However, in the case of stylometric input space, experiments show that random choice is not always the best way to go, even with increasing the number of folds above popular standards [ 66 ]. The problem lies in the specific distribution of datapoints in space, which is caused by the initial pre-processing of text samples.…”
Section: Proposed Methodologymentioning
confidence: 99%
“…Standard cross-validation approaches advocate the random selection of samples from a set, repeated a certain number of times, and averaging results over such folds [ 65 ]. However, in the case of stylometric input space, experiments show that random choice is not always the best way to go, even with increasing the number of folds above popular standards [ 66 ]. The problem lies in the specific distribution of datapoints in space, which is caused by the initial pre-processing of text samples.…”
Section: Proposed Methodologymentioning
confidence: 99%
“…To avoid falsely optimistic test results, for the evaluation of learnt patterns, samples should never be used based on the same texts that are used for training. Standard crossvalidation, with its foundation of random choice of samples for folds, cannot be trusted to return a reliable estimation of classification accuracy [32]. Instead, nonstandard crossvalidation, with swapping whole groups of samples (instead of individual instances) could be attempted, but it results in highly increased computational costs.…”
Section: Construction Of Datasetsmentioning
confidence: 99%
“…The average of the partial results then leads to the final outcome. In the stylometric domain, this approach proves problematic due to the existing stratification of the input space [32]. Data points are grouped by the original long works they are based on.…”
Section: Evaluation Of Performancementioning
confidence: 99%
“…A k-fold cross-validation method is used to validate each identification performance index from the result of the testing set of a CNN model. This research uses the value of k = 5, which is the standard k value for cross-validation [24,25]. In the experiments, both the training and testing sets are partitioned into five folds and randomly shuffled.…”
Section: K-fold Cross-validationmentioning
confidence: 99%