“…IR performance evaluation involves test collections, sampling, topics (queries, tasks) formation, and relevance evaluation, and as a general topic, this area has been widely studied (Corcoglioniti, Dragoni, Rospocher, & Aprosio, 2016;Cormack & Lynam, 2006;Hu, Huang, & Hu, 2012;J€ arvelin & Kek€ al€ ainen, 2002;Koopman, Bruza, Sitbon, & Lawley, 2011;Liu, An, & Huang, 2015;Tamine, Chouquet, & Palmer, 2015;Waitelonis, Exeler, & Sack, 2015;Yilmaz, Kanoulas, & Aslam, 2008). In this article, we study relevance evaluation, and particularly, novelty and diversity evaluation in biomedical IR.…”