Yuri Burda scite author profile

We introduce Codex, a GPT language model finetuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J solves 11.4%. Furthermore, we find that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts. Using this method, we solve 70.2% of our problems with 100 samples per problem. Careful investigation of our model reveals its limitations, including difficulty with docstrings describing long chains of operations and with binding operations to variables. Finally, we discuss the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics.

show abstract

Exploration by Random Network Distillation

Burda¹,

Edwards²,

Storkey³

et al. 2018

Preprint

207

344

View full text Add to dashboard Cite

We introduce an exploration bonus for deep reinforcement learning methods that is easy to implement and adds minimal overhead to the computation performed. The bonus is the error of a neural network predicting features of the observations given by a fixed randomly initialized neural network. We also introduce a method to flexibly combine intrinsic and extrinsic rewards. We find that the random network distillation (RND) bonus combined with this increased flexibility enables significant progress on several hard exploration Atari games. In particular we establish state of the art performance on Montezuma's Revenge, a game famously difficult for deep reinforcement learning methods. To the best of our knowledge, this is the first method that achieves better than average human performance on this game without using demonstrations or having access to the underlying state of the game, and occasionally completes the first level.

show abstract

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

Power¹,

Burda²,

Edwards³

et al. 2022

Preprint

View full text Add to dashboard Cite

In this paper we propose to study generalization of neural networks on small algorithmically generated datasets. In this setting, questions about data efficiency, memorization, generalization, and speed of learning can be studied in great detail. In some situations we show that neural networks learn through a process of "grokking" a pattern in the data, improving generalization performance from random chance level to perfect generalization, and that this improvement in generalization can happen well past the point of overfitting. We also study generalization as a function of dataset size and find that smaller datasets require increasing amounts of optimization for generalization. We argue that these datasets provide a fertile ground for studying a poorly understood aspect of deep learning: generalization of overparametrized neural networks beyond memorization of the finite training dataset.

show abstract

Around rational functions invertible in radicals

Burda¹

2010

Preprint

View full text Add to dashboard Cite

A class of rational functions characterized by some wonderful properties is studied. The properties that identify this class include simple algebra (their inverses can be expressed in radicals), simple topology (the total space of the minimal Galois covering dominating them has genus 0 or 1) and simple local topology (branching data). Explicit formulae for these functions are obtained as well as their classification up to different equivalence relations.

show abstract

Polynomials Invertible in k-Radicals

Burda¹,

Khovanskiĭ²

2016

Arnold Math J.

View full text Add to dashboard Cite

A classic result of Ritt describes polynomials invertible in radicals: they are compositions of power polynomials, Chebyshev polynomials and polynomials of degree at most 4. In this paper we prove that a polynomial invertible in radicals and solutions of equations of degree at most k is a composition of power polynomials, Chebyshev polynomials, polynomials of degree at most k and, if k ≤ 14, certain polynomials with exceptional monodromy groups. A description of these exceptional polynomials is given. The proofs rely on classification of monodromy groups of primitive polynomials obtained by Müller based on group-theoretical results of Feit and on previous work on primitive polynomials with exceptional monodromy groups by many authors. arXiv:1209.5137v1 [math.AG]

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yuri Burda

Evaluating Large Language Models Trained on Code

Exploration by Random Network Distillation

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

Around rational functions invertible in radicals

Polynomials Invertible in k-Radicals

Contact Info

Product

Resources

About