2022
DOI: 10.1101/2022.05.03.490388
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A real data-driven simulation strategy to select an imputation method for mixed-type trait data

Abstract: Missing observations in trait datasets pose an obstacle for analyses in myriad biological disciplines. Imputation offers an alternative to removing cases with missing values from datasets. Imputation techniques that incorporate phylogenetic information into their estimations have demonstrated improved accuracy over standard techniques. However, previous studies of phylogenetic imputation tools are largely limited to simulations of numerical trait data, with categorical data not evaluated. It also remains to be… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 80 publications
(162 reference statements)
0
3
0
Order By: Relevance
“…Therefore, while our results agree with others that random forest models (as implemented by the missForest R function) are an accurate imputation method for trait data (Johnson et al, 2021), care should be taken to ensure use of imputation is appropriate. Our findings regarding the utility of imputation are only applicable to continuous trait imputation, as the efficacy of categorical traits imputation was not explored (although see May et al, 2023), and to large trait data sets on the scale of hundreds or thousands of species rather than tens. The utility of imputation in tackling missing and biased data has been shown to depend on the correlation between traits, and extent of phylogenetic autocorrelation (Clavel et al, 2015).…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Therefore, while our results agree with others that random forest models (as implemented by the missForest R function) are an accurate imputation method for trait data (Johnson et al, 2021), care should be taken to ensure use of imputation is appropriate. Our findings regarding the utility of imputation are only applicable to continuous trait imputation, as the efficacy of categorical traits imputation was not explored (although see May et al, 2023), and to large trait data sets on the scale of hundreds or thousands of species rather than tens. The utility of imputation in tackling missing and biased data has been shown to depend on the correlation between traits, and extent of phylogenetic autocorrelation (Clavel et al, 2015).…”
Section: Discussionmentioning
confidence: 99%
“…We tested two ways of dealing with the generated incomplete data sets: (1) removal of species with missing data (complete case analysis) and (2) filling data gaps through imputation. We used missForest imputation, implemented through the missForest (Stekhoven & Bühlmann, 2012), due to its demonstrated accuracy (Hong & Lynn, 2020; May et al, 2023; Penone et al, 2014), and fast computation times. Accounting for phylogenetic relatedness between species can improve imputation accuracy (May et al, 2023; Penone et al, 2014).…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation