14 1. Given the prevalence of missing data on species' traits -Raunkiaeran shorfall 15 and its importance for theoretical and empirical investigations, several 16 methods have been proposed to fill sparse databases. Despite its advantages, 17 imputation of missing data can introduce biases. Here, we evaluate the bias in 18 descriptive statistics, model parameters, and phylogenetic signal estimation from 19 imputed databases under different missing and imputing scenarios. 20 2. We simulated coalescent phylogenies and traits under Brownian Motion and 21 different Ornstein-Uhlenbeck evolutionary models. Missing values were created 22 using three scenarios: missing completely at random, missing at random but 23 phylogenetically structured and missing at random but correlated with some 24 other variable. We considered four methods for handling missing data: delete 25 missing values, imputation based on observed mean trait value, Phylogenetic 26 Eigenvectors Maps and Multiple Imputation by Chained Equations. Finally, we 27 assessed estimation errors of descriptive statistics (mean, variance), regression 28 coefficient, Moran's correlogram and Blomberg's K of imputed traits. 29 3. We found that percentage of missing data, missing mechanisms, Ornstein-30Uhlenbeck strength and handling methods were important to define estimation 31 errors. When data were missing completely at random, descriptive statistics 32 were well estimated but Moran's correlogram and Blomberg's K were not well 33 estimated, depending on handling methods. We also found that handling 34 methods performed worse when data were missing at random, but 35 phylogenetically structured. In this case adding phylogenetic information 36 provided better estimates. Although the error caused by imputation was 37 3 correlated with estimation errors, we found that such relationship is not linear 38 with estimation errors getting larger as the imputation error increases. 39 4. Imputed trait databases could bias ecological and evolutionary analyses. We 40 advise researchers to share their raw data along with their imputed database, 41 flagging imputed data and providing information on the imputation process.
42Thus, users can and should consider the pattern of missing data and then look for 43 the best method to overcome this problem. In addition, we suggest the 44 development of phylogenetic methods that consider imputation uncertainty, 45 phylogenetic autocorrelation and preserve the level of phylogenetic signal of the 46 original data.47 48