“…Most of the datasets based on scientific literature are domain specific with strong emphasis on the biomedical domain (Blake, 2010;Alamri and Stevenson, 2016;Achakulvisut et al, 2019;Mayer et al, 2020). Other domains are educa-tion (Kirschner et al, 2015), computer graphics (Lauscher et al, 2018), and computational linguistics (Accuosto and Saggion, 2019). SciARK, as a multidisciplinary dataset, includes abstract from biomedical, social, environmental and other domains.…”