Context such as the user's search history, demographics, devices, and surroundings, has become prevalent in various domains of information seeking and retrieval such as mobile search, task-based search, and social search. While evaluation is central and has a long history in information retrieval, it faces the big challenge of designing an appropriate methodology that embeds the context into evaluation settings. In this article, we present a unified summary of a wide range of main and recent progress in contextual information retrieval evaluation that leverages diverse context dimensions and uses different principles, methodologies, and levels of measurements. More specifically, t his s urvey a rticle a ims t o fill tw o ma in ga ps in th e li terature: Fi rst, it provides a critical summary and comparison of existing contextual information retrieval evaluation methodologies and metrics according to a simple stratifi cation m odel; s econd, i t p oints o ut t he i mpact o f context dynamicity and data privacy on the evaluation design. Finally, we recommend promising research directions for future investigations.