2022
DOI: 10.48550/arxiv.2204.06894
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

To What Extent do Deep Learning-based Code Recommenders Generate Predictions by Cloning Code from the Training Set?

Abstract: Deep Learning (DL) models have been widely used to support code completion. These models, once properly trained, can take as input an incomplete code component (e.g., an incomplete function) and predict the missing tokens to finalize it. GitHub Copilot is an example of code recommender built by training a DL model on millions of open source repositories: The source code of these repositories acts as training data, allowing the model to learn "how to program". The usage of such a code is usually regulated by Fr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 36 publications
0
2
0
Order By: Relevance
“…It is also possible that some repositories that were used by Codex during training (and from which Codex could technically produce verbatim content) may have been deleted or made private between the time Codex was fine-tuned and our analysis. However, we consider this possibility very remote Ciniselli et al (2022). In addition, our definition of novelty mostly relied on the exercises being novel in the sense that they are not direct copies of existing exercises.…”
Section: Threats To Validitymentioning
confidence: 99%
“…It is also possible that some repositories that were used by Codex during training (and from which Codex could technically produce verbatim content) may have been deleted or made private between the time Codex was fine-tuned and our analysis. However, we consider this possibility very remote Ciniselli et al (2022). In addition, our definition of novelty mostly relied on the exercises being novel in the sense that they are not direct copies of existing exercises.…”
Section: Threats To Validitymentioning
confidence: 99%
“…One of the key features that defines an IDE is code completion [1,9,18,41,43]. Code completion speeds up the programming process by automatically suggesting code that the developer is about to write, while also helping them avoid possible typos.…”
Section: Introductionmentioning
confidence: 99%