2022
DOI: 10.1590/0103-8478cr20201072
|View full text |Cite
|
Sign up to set email alerts
|

Kennard-Stone method outperforms the Random Sampling in the selection of calibration samples in SNPs and NIR data

Abstract: Splitting the whole dataset into training and testing subsets is a crucial part of optimizing models. This study evaluated the influence of the choice of the training subset in the construction of predictive models, as well as on their validation. For this purpose we assessed the Kennard-Stone (KS) and the Random Sampling (RS) methods in near-infrared spectroscopy data (NIR) and marker data SNPs (Single Nucleotide Polymorphisms). It is worth noting that in SNPs data, there is no knowledge of reports in the lit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 36 publications
0
5
0
Order By: Relevance
“…This ensures that there is no data leakage during the calculation process, while also ensuring the consistency and reproducibility of the dataset. 41–43 Based on the full dataset, 36 samples were assigned to the training set, 12 samples to the validation set, and the remaining 11 samples to the test set.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…This ensures that there is no data leakage during the calculation process, while also ensuring the consistency and reproducibility of the dataset. 41–43 Based on the full dataset, 36 samples were assigned to the training set, 12 samples to the validation set, and the remaining 11 samples to the test set.…”
Section: Resultsmentioning
confidence: 99%
“…This ensures that there is no data leakage during the calculation process, while also ensuring the consistency and reproducibility of the dataset. [41][42][43] Based on the full dataset, 36 samples were assigned to the training set, 12 samples to the validation set, and the remaining 11 samples to the test set. The split data were input to different elemental concentration prediction models, including HDNN, DNN, BPNN, and PLSR methods, to prove the superiority of our proposed HDNN method.…”
Section: Quantitative Analysis By the Hdnn Methodsmentioning
confidence: 99%
“…Prior to modeling, the dataset was divided into a training set and a test set according to the Kennard–Stone algorithm 41 to show the classification ability of the model. To assess the relative robustness of the various preprocessing methods, modeling is performed next, and the modeling results are presented and analyzed in Section 3.2 .…”
Section: Resultsmentioning
confidence: 99%
“…The Kennard-stone algorithm was utilized to select 80% of them as the training set and the remaining 20% as the prediction set. 35 Thus, the training set has 216 sets of samples and the prediction set contains 54 sets of samples. The input form of the soil XRF spectral data is 800 × 1.…”
Section: Methodsmentioning
confidence: 99%