The aim of time series homogenization is to remove non-climatic effects, such as changes in station location, instrumentation, observation practices, etc., from observed data. Statistical homogenization usually reduces the non-climatic effects, but does not remove them completely. In the Spanish MULTITEST project, the efficiencies of automatic homogenization methods were tested on large benchmark datasets of a wide range of statistical properties. In this study, test results for 9 versions, based on 5 homogenization methods (ACMANT, Climatol, MASH, PHA and RHtests) are presented and evaluated. The tests were executed with 12 synthetic/surrogate monthly temperature test datasets containing 100 to 500 networks with 5 to 40 time series in each. Residual centred root mean square errors and residual trend biases were calculated both for individual station series and for network mean series.The results show that a larger fraction of the non-climatic biases can be removed from station series than from network-mean series. The largest error reduction is found for the long-term linear trends of individual time series in datasets with a high signal-to-noise ratio (SNR), there the mean residual error is only 14 – 36% of the raw data error. When the SNR is low, most of the results still indicate error reductions, although with smaller ratios than for large SNR. Generally, ACMANT gave the most accurate homogenization results. In the accuracy of individual time series ACMANT is closely followed by Climatol, while for the accurate calculation of mean climatic trends over large geographical regions both PHA and ACMANT are recommended.