2019
DOI: 10.48550/arxiv.1908.09804
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Neural Code Search Evaluation Dataset

Abstract: There has been an increase of interest in code search using natural language. Assessing the performance of such code search models can be difficult without a readily available evaluation suite. In this paper, we present an evaluation dataset consisting of natural language query and code snippet pairs, with the hope that future work in this area can use this dataset as a common benchmark. We also provide the results of two code search models ([6] and [1]) from recent work as a benchmark.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
14
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(17 citation statements)
references
References 4 publications
(11 reference statements)
0
14
0
Order By: Relevance
“…In particular, Aroma was proved to be effective in identifying similarities between partial code snippets, e.g., obtained from STACKOVERFLOW. Similar to other contributions [32], [33], we use Aroma to define a metric for the similarity between the answers in our evaluation set. This metric is intended to mimic the manual assessment of the correctness of search results but in an automatic and reproducible way [33], without relying on the human judgment that, considering the size of our dataset, would be infeasible.…”
Section: Mean Reciprocal Rank (Mrr)mentioning
confidence: 99%
“…In particular, Aroma was proved to be effective in identifying similarities between partial code snippets, e.g., obtained from STACKOVERFLOW. Similar to other contributions [32], [33], we use Aroma to define a metric for the similarity between the answers in our evaluation set. This metric is intended to mimic the manual assessment of the correctness of search results but in an automatic and reproducible way [33], without relying on the human judgment that, considering the size of our dataset, would be infeasible.…”
Section: Mean Reciprocal Rank (Mrr)mentioning
confidence: 99%
“…Recently, an interesting direction of software engineering is to use machine/deep learning for different tasks to improve software development. Such as code search (e.g., [2,24,31,39]), clone detection (e.g., [7,18,19,64,67]), program repair (e.g,. [10,45,60,66]), document (such as API and questions/answers/tags) recommendation (e.g., [22,25,26,55,63,65,69,70,76]).…”
Section: Machine/deep Learning On Software Engineeringmentioning
confidence: 99%
“…All the code snippets are embedded into a high-dimensional vector space by our approach. A variety of applications such as code search (e.g., [24,31,39]) , summarization (e.g., [30,32,33,62]), retrieval (e.g., [1,9,71]), and API recommendation (e.g., [25,26]) can benefit from the code embeddings used in our study.…”
Section: The Problem and Our Solutionmentioning
confidence: 99%
See 1 more Smart Citation
“…In summary, our contributions to the field of code-to-code recommendation in this paper are four-fold: [17] and the Neural Code Search evaluation dataset [28] and find that the snippet lengths are heavily skewed, following a power-law distribution, with the vast majority of the snippets being short in length, and a long tail of longer snippets. We argue that code-to-code recommendation engines, to return concise and useful snippets, should implement techniques to counteract the bias caused by this skewness.…”
Section: Introductionmentioning
confidence: 99%