We propose measurement integrity, a property related to ex post reward fairness, as a novel desideratum for peer prediction mechanisms in many natural applications, including peer assessment. We then operationalize this notion to evaluate the measurement integrity of different peer prediction mechanisms in computational experiments. Our evaluations simulate the application of peer prediction mechanisms to peer assessment-a setting in which realistic models have been validated on real data and in which ex post fairness concerns are particularly salient. We find that peer prediction mechanisms, as proposed in the literature, largely fail to demonstrate measurement integrity in our experiments. However, we also find that supplementing the mechanisms with realistic parametric statistical models can, in some cases, improve their measurement integrity.In the same setting, we also evaluate an empirical notion of robustness against strategic behavior to complement the theoretical analyses of robustness against strategic behavior that have been the primary focus of the peer prediction literature. In this dimension of analysis, we again find that supplementing certain mechanisms with realistic parametric statistical models can improve their empirical performance. Even so, though, we find that theoretical guarantees of robustness against strategic behavior are somewhat noisy predictors of empirical robustness against strategic behavior.As a whole, our empirical methodology for quantifying desirable mechanism properties facilitates a more nuanced comparison between mechanisms than theoretical analysis alone. Ultimately, we find that there is a trade-off between our two dimensions of analysis. The best performing mechanisms for measurement integrity are highly susceptible to strategic behavior. On the other hand, certain parametric peer prediction mechanisms are robust against all the strategic manipulations that we consider while still achieving reasonable measurement integrity.