As in most information retrieval (IR) studies, evaluation plays an essential part in Web search research. Both o ine and online evaluation metrics are adopted in measuring the performance of search engines. O ine metrics are usually based on relevance judgments of query-document pairs from assessors while online metrics exploit the user behavior data, such as clicks, collected from search engines to compare search algorithms. Although both types of IR evaluation metrics have achieved success, to what extent can they predict user satisfaction still remains under-investigated. To shed light on this research question, we meta-evaluate a series of existing online and o ine metrics to study how well they infer actual search user satisfaction in di erent search scenarios. We nd that both types of evaluation metrics signi cantly correlate with user satisfaction while they re ect satisfaction from di erent perspectives for di erent search tasks. O ine metrics be er align with user satisfaction in homogeneous search (i.e. ten blue links) whereas online metrics outperform when vertical results are federated. Finally, we also propose to incorporate mouse hover information into existing online evaluation metrics, and empirically show that they be er align with search user satisfaction than click-based online metrics.