2013
DOI: 10.1016/j.aca.2013.09.027
|View full text |Cite
|
Sign up to set email alerts
|

Merits of random forests emerge in evaluation of chemometric classifiers by external validation

Abstract: Real-world applications will inevitably entail divergence between samples on which chemometric classifiers are trained and the unknowns requiring classification. This has long been recognized, but there is a shortage of empirical studies on which classifiers perform best in 'external validation' (EV), where the unknown samples are subject to sources of variation relative to the population used to train the classifier. Survey of 286 classification studies in analytical chemistry found only 6.6% that stated elem… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
25
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 34 publications
(32 citation statements)
references
References 74 publications
(140 reference statements)
2
25
0
Order By: Relevance
“…Indeed, these top features were ranked among the 10 first ones with each techniques. This was also observed in previous published studies (Menze et al, 2009; Guo and Balasubramanian, 2012; Chen et al, 2013; Scott et al, 2013) with high dimension datasets from omics data. In particular, RF was shown to be robust to noise and outliers, and a technique of choice to avoid overfitting.…”
Section: Discussionsupporting
confidence: 86%
See 2 more Smart Citations
“…Indeed, these top features were ranked among the 10 first ones with each techniques. This was also observed in previous published studies (Menze et al, 2009; Guo and Balasubramanian, 2012; Chen et al, 2013; Scott et al, 2013) with high dimension datasets from omics data. In particular, RF was shown to be robust to noise and outliers, and a technique of choice to avoid overfitting.…”
Section: Discussionsupporting
confidence: 86%
“…Its main advantage, as reported in the literature (Ho, 1998; Liaw and Wiener, 2002; Biau, 2012; Hapfelmeier et al, 2014), includes principally its power to deal with over-fitting and missing data, as well as its capacity to handle large datasets without variable elimination in terms of feature selection (Menze et al, 2009). It was successfully applied as a biomarker selection tool for metabolomic data analysis in several studies (Chen et al, 2013; Scott et al, 2013; Gromski et al, 2015), especially due to its resilience to high dimensionality data, insensitivity to noise, and resistance to overfitting, etc. Nevertheless, it generates different results, contrary to SVM which delivers a unique solution.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In order to confirm the validity of PLS regression models we employed cross-validation, independent test sets [32,34] and permutation testing [33]. In case of HPLC and CE data processing variable selection [35] was performed on the basis of regression coefficients of the variables in the models.…”
Section: Data Processingmentioning
confidence: 99%
“…Compared with other machine learning algorithm, RF has many specific properties, such as computational efficiency on large dataset, outstanding performance in the prediction accuracy and good estimation of important variables. During the past decades, RF has been widely used in the field of analytical chemistry and chemometrics [31][32][33][34]. For detailed description of RF, see Ref.…”
Section: Construction Of Model and Assessment Of Performancementioning
confidence: 99%