2014
DOI: 10.1111/2041-210x.12232
|View full text |Cite
|
Sign up to set email alerts
|

Imputation of missing data in life‐history trait datasets: which approach performs the best?

Abstract: Summary1. Despite efforts in data collection, missing values are commonplace in life-history trait databases. Because these values typically are not missing randomly, the common practice of removing missing data not only reduces sample size, but also introduces bias that can lead to incorrect conclusions. Imputing missing values is a potential solution to this problem. Here, we evaluate the performance of four approaches for estimating missing values in trait databases (K-nearest neighbour (kNN), multivariate … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

10
428
1
2

Year Published

2016
2016
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 339 publications
(466 citation statements)
references
References 47 publications
10
428
1
2
Order By: Relevance
“…Model-based imputation methods use other variables in the dataset to impute missing data, but they substantially alter the univariate trait distributions and the covariance structure of the dataset (Gelman and Hill, 2007). Approaches such as k nearest neighbour (kNN) or machine-learning methods (Stekhoven and Bühlmann, 2012) may be more appropriate to impute multivariate datasets, preserving their covariance structure (Eskelson et al, 2009;Penone et al, 2014). In a multiple imputation framework, m imputed datasets are obtained through simulation and may be jointly analysed to provide parameter estimates that take into account the uncertainty introduced by the imputations themselves (e.g.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…Model-based imputation methods use other variables in the dataset to impute missing data, but they substantially alter the univariate trait distributions and the covariance structure of the dataset (Gelman and Hill, 2007). Approaches such as k nearest neighbour (kNN) or machine-learning methods (Stekhoven and Bühlmann, 2012) may be more appropriate to impute multivariate datasets, preserving their covariance structure (Eskelson et al, 2009;Penone et al, 2014). In a multiple imputation framework, m imputed datasets are obtained through simulation and may be jointly analysed to provide parameter estimates that take into account the uncertainty introduced by the imputations themselves (e.g.…”
Section: Introductionmentioning
confidence: 99%
“…Complex imputation methods such as kNN, MICE or random forests generally outperform overall mean or species mean imputations (Penone et al, 2014;Taugourdeau et al, 2014). In earlier applications of these methods, it has been common to assume that interspecific trait variability was dominant, compared to intraspecific variability.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…We can easily automate calculations involving the above formulae with currently available R packages for multiple imputation such as mice (reviewed in Nakagawa and Freckleton 2011; see also Penone et al 2014).…”
Section: ! 6!mentioning
confidence: 99%
“…However, Penone et al (2014) recently showed that phylogenetic information could be added to multiple imputation in the form of phylogenetic eigenvectors (Diniz et al 1998; see also Guénard et al 2003), which can be seen as additional predictor variables (for the imputation step not for the analysis step). This means that, to conduct multiple imputation for comparative data, one can use general and flexible packages such as mice (van Buuren and Groothuis-Oudshoorn 2011), as was used for our simulation.…”
Section: ! 12!mentioning
confidence: 99%