A new statistical approach to validating climate models is introduced. First, five observational estimates of global mean surface temperature with estimated standard errors are combined into one data product, latent observed annual global temperature anomalies for the years 1880–2014, using a Bayesian hierarchical statistical approach. Summarizing these observed anomalies, estimates of smooth trend, levels of warming, and residual dependence as summarized by the spectral density function, are provided with simultaneous 95% credible bands. Then, corresponding estimates of smooth trend, levels of warming, and residual dependence are produced for sixth Climate Model Intercomparison Project (CMIP6) historical simulations analyzed at the annual global temperature anomaly scale, and compared to these bands. Among our results, we find that 93 out of the 318 CMIP6 historical model runs contain trends fitting inside the simultaneous bands for the smooth trend constructed from the data products, and for residual temporal dependence 69 out of 318 model runs contain spectral density functions that are within the corresponding data‐product‐based‐bands. We estimate the mean global temperature increase from 1995–2014 relative to 1880–1899, from the data product, to be 0.896°C with a 95% credible interval of between 0.877 and 0.915. We find that 14 CMIP6 model runs agree with this interval, 197 model runs lead to a smaller temperature increase globally, and 107 model runs lead to a larger temperature increase.