Abstract. An objective approach is presented for scoring coupled climate simulations through an evaluation against satellite and reanalysis datasets during the satellite era (i.e. since 1979). Here, the approach is described and applied to available Coupled Model Intercomparison Project (CMIP) archives and the Community Earth System Model Version 1 Large Ensemble archives, with the goal of benchmarking model performance and its evolution across CMIP generations. The approach adopted is designed to minimize the sensitivity of scores to internal variability, external forcings, and model tuning. Toward this end, models are scored based on pattern correlations of their simulated mean state, seasonal contrasts, and ENSO teleconnections. A broad range of feedback-relevant fields is considered and summarized on various timescales (climatology, seasonal, interannual) and physical realms (energy budget, water cycle, dynamics). Fields are also generally chosen for which observational uncertainty is small compared to model structural differences and error. Highest mean variable scores across models are reported for well-observed fields such as sea level pressure, precipitable water, and outgoing longwave radiation while the lowest scores are reported for 500 hPa vertical velocity, net surface energy flux, and precipitation minus evaporation. The fidelity of CMIP models is found to vary widely both within and across CMIP generations. Systematic increases in model fidelity across CMIP generations are identified with the greatest improvements in dynamic and energetic fields. Examples include 500 hPa eddy geopotential height and relative humidity, and shortwave cloud forcing. Improvements for ENSO scores are substantially greater than for the annual mean or seasonal contrasts. Analysis output data generated by this approach is made freely available online for a broad range of model ensembles, including the CMIP archives and various single-model large ensembles. These multi-model archives allow for an exploration of relationships between metrics across a range of simulations while the single-model large ensemble archives enable an estimation of the influence of internal variability on reported scores. The entire output archive, updated regularly, can be accessed at: http://webext.cgd.ucar.edu/Multi-Case/CMAT/index.html chosen for which observational uncertainty is small compared to model structural error. 20 Highest mean variable scores across models are reported for well-observed fields such as sea level pressure, precipitable water, and outgoing longwave radiation while the lowest scores are reported for 500 hPa vertical velocity, net surface energy flux, and precipitation minus evaporation. The fidelity of CMIP models is found to vary widely both within and across CMIP generations. CMATv1 scores report systematic increases in model fidelity across CMIP generations with the greatest improvements in dynamic and energetic fields. Examples include 500 hPa eddy geopotential height and relative humidity, 25 and shortwave cloud forcing. Improvements for ENSO scores are substantially greater than for the annual mean or seasonal contrasts. Analysis output data is made freely available online for a broad range of model ensembles, including the CMIP archives and various single-model large ensembles. These multi-model archives allow for an exploration of relationships between metrics 30 across a range of simulations while the single-model large ensemble archives enable an estimation of the influence of internal variability on CMATV1 scores. The entire CMATv1 archive, updated regularly, can be accessed at: http://webext.cgd.ucar.edu/Multi-Case/CMAT/index.html.