“…We need large models that contain representations of vast amounts of knowledge in core and progressively capilar scientific disciplines, with a grounding in world knowledge and commonsense understanding, as well as the ability to continuously acquire and update the model's beliefs as the state of the art evolves. Until now, scientific language models like SciBert (Beltagy et al, 2019), BioBert (Lee et al, 2019) or SpaceRoBERTa (Berquand et al, 2021) have tried to address the challenge of domain-specifty through additional pre-training on large amounts of scientific documents that leverage largescale open access scientific resources like OpenAire 3 , arXiv 4 , Web of Science 5 or Semantic Scholar. 6 However, only such knowledge which is statistically significant in the training data is effectively captured, limiting the usefulness of pre-trained language models as knowledge bases or reasoning engines.…”