In recent years, a huge amount of research effort and funding has been devoted to the area of Semantic Web services (SWS). This has resulted in the proposal of numerous competing approaches to facilitate the automation of mediation, choreography, and discovery for Web services using semantic annotations. However, despite a wealth of theoretical work, too little effort has been spent toward the comparative experimental evaluation of the competing approaches so far. Progress in scientific development and industrial adoption is thereby hindered. An established evaluation methodology and standard benchmarks that allow the comparative evaluation of different frameworks are thus needed for the further advancement of the field. To this end, a criteria model for SWS evaluation is presented, and the existing approaches toward SWS evaluation are comprehensively analyzed. Their shortcomings are discussed in order to identify the fundamental issues of SWS evaluation. Based on this discussion, a research agenda toward agreed upon evaluation methodologies is proposed.