“…Metadata such as page viewership statistics is helpful to rank webpages [Nie et al, 2019]. However, when search engines are not available, such as PolitiFact [Vlachos and Riedel, 2014] 106 claims Politics Very small; metadata and evidence of various forms Emergent [Ferreira and Vlachos, 2016] 300 claims News Very small; 2595 associated documents LIAR 12,836 claims Politics Medium; metadata Snopes [Popat et al, 2017] 4,956 claims Snopes website Medium; 30 Google retrieved documents for each claim FEVER [Thorne et al, 2018a] 185,445 claims Wikipedia Big; associated Wikipeida evidence LIAR-PLUS [Alhindi et al, 2018] 12,836 claims Politics Medium; automatically extracted justifications Perspectrum [Chen et al, 2019b] 907 claims Debates Small; evidence and perspectives UKP Snopes [Hanselowski et al, 2019] 6,422 claims Snopes website Medium; associated evidence MultiFC [Augenstein et al, 2019] 34,918 claims Fact-checking websites Medium; metadata and 10 Google retrieved webpages for each claim Scifact [Wadden et al, 2020] 1,409 claims Scientific papers Small; associated documents PolitiHop [Ostrowski et al, 2020] 500 claims Politics Very small; evidence chains for multi-hop reasoning WikiFactCheck-English [Sathe et al, 2020] 124,821 claims Wikipedia Big; context and evidence Climate-FEVER [Diggelmann et al, 2021] 1,535 claims Climate Medium; 7,675 claim-evidence pairs with climate related claims verified against Wikipedia evidence COVID-Fact [Saakyan et al, 2021] 4,086 claims COVID-19 Medium; 1,296 supported claims from r/COVID19 subreddit and 2,790 automatically generated refuted claims Vitamin-C [Schuster et al, 2021] 488,904 pairs Wikipedia Big; contrastive evidence from Wikipedia edits FEVEROUS [Aly et al, 2021] 87,026 claims Wikipedia Biggest; evidence collected from both structured and unstructured information on whole Wikipedia in the SCIVER shared task, the majority of effort goes into exploring similarity metrics that are used as a proxy to determine the documents' relevance to a claim. TF-IDF similarity is a common baseline [Wadden et al, 2020, Malon, 2018 and BM25 [Robertson et al, 1994] is demonstrated to be effective [Pradeep et al, 2020].…”