Marcin Kardas scite author profile

Information overload is a major obstacle to scientific progress. The explosive growth in scientific literature and data has made it ever harder to discover useful insights in a large mass of information. Today scientific knowledge is accessed through search engines, but they are unable to organize scientific knowledge alone. In this paper we introduce Galactica: a large language model that can store, combine and reason about scientific knowledge. We train on a large scientific corpus of papers, reference material, knowledge bases and many other sources. We outperform existing models on a range of scientific tasks. On technical knowledge probes such as LaTeX equations, Galactica outperforms the latest GPT-3 by 68.2% versus 49.0%. Galactica also performs well on reasoning, outperforming Chinchilla on mathematical MMLU by 41.3% to 35.7%, and PaLM 540B on MATH with a score of 20.4% versus 8.8%. It also sets a new state-of-the-art on downstream tasks such as PubMedQA and MedMCQA dev of 77.6% and 52.9%. And despite not being trained on a general corpus, Galactica outperforms BLOOM and OPT-175B on BIG-bench. We believe these results demonstrate the potential for language models as a new interface for science. We open source the model for the benefit of the scientific community 1 .Computing has indeed revolutionized how research is conducted, but information overload remains an overwhelming problem (Bornmann and Mutz, 2014). In May 2022, an average of 516 papers per day were submitted to arXiv (arXiv, 2022). Beyond papers, scientific data is also growing much more quickly than our ability to process it (Marx, 2013). As of August 2022, the NCBI GenBank contained 1.49 × 10 12 nucleotide bases (GenBank, 2022). Given the volume of information, it is impossible for a single person to read all the papers in a given field; and it is likewise challenging to organize data on the underlying scientific phenomena.Search engines are the current interface for accessing scientific knowledge following the Licklider paradigm. But they do not organize knowledge directly, and instead point to secondary layers such as Wikipedia,

Facilitating an Entrepreneurial Discovery Process for Smart Specialisation. The Case of Poland

Mieszkowski¹,

2015

J Knowl Econ

The purpose of this paper is to study stakeholder involvement in research and innovation policy in Poland in the context of smart specialisation exercise. The article addresses the questions to what extent initiatives such as the foresight programmes, strategic research and development programmes, and sectoral research programmes facilitate the entrepreneurial discovery process for smart specialisation. The role of different groups of actors varies substantially in terms of their involvement and impact on such initiatives. The foresight and strategic research and development programmes were dominated by representatives of the research community and embody a research-oriented top-down approach. The sectoral research programmes are most closely related to the bottom-up and demand-driven approach in which the leading role is played by entrepreneurs. For this reason, they are more familiar with the conditions of the entrepreneurial discovery process. In this case, the important role is also played by the governmental agency which facilitates those processes.

Energy-Efficient Leader Election Protocols for Single-Hop Radio Networks

Klonowski²,

Pająk³

2013

In this paper we investigate leader election protocols for single-hop radio networks from perspective of energetic complexity. We discuss different models of energy consumption and its relation with time complexity. We also present some results about energy consumption in classic protocols optimal with respect to time complexity-we show that some very basic, intuitive algorithms for simplest models (with known number of stations) do not have to be optimal when energy of stations is restricted. We show that they can be significantly improved by introducing very simple modifications. Our main technical result is however a protocol for solving leader election problem in case of unknown number of stations n, working on expectancy within O(log ǫ n) rounds, with each station transmitting O(1) number of times and no station being awake for more than O(log log log n) rounds.

Approximating the Size of a Radio Network in Beeping Model

Brandes

Klonowski

et al. 2016

On Delta-Method of Moments and Probabilistic Sums

Cichoń

Gołębiewski

et al. 2013

We discuss a general framework for determining asymptotics of the expected value of random variables of the form f (X) in terms of a function f and central moments of the random variable X. This method may be used for approximation of entropy, inverse moments, and some statistics of discrete random variables useful in analysis of some randomized algorithms. Our approach is based on some variant of the Delta Method of Moments. We formulate a general result for an arbitrary distribution and next we show its specic extension to random variables which are sums of identically distributed independent random variables. Our method simplies previous proofs of results of several authors and can be automated to a large extent. We apply our method to the binomial, negative binomial, Poisson and hypergeometric distribution. We extend the class of functions for which our method is applicable to some subclass of exponential functions and double exponential functions for some cases.