Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of 2019
DOI: 10.1145/3338906.3340458
|View full text |Cite
|
Sign up to set email alerts
|

When deep learning met code search

Abstract: There have been multiple recent proposals on using deep neural networks for code search using natural language. Common across these proposals is the idea of embedding code and natural language queries, into real vectors and then using vector distance to approximate semantic correlation between code and the query. Multiple approaches exist for learning these embeddings [15,19,24,26], including unsupervised techniques, which rely only on a corpus of code examples, and supervised techniques, which use an aligned … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
146
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
1
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 183 publications
(162 citation statements)
references
References 27 publications
0
146
0
Order By: Relevance
“…Several of the latest code search techniques that find code given a natural language query rely on machine learning techniques (e.g.,NCS [10], DeepCS [8], UNIF [38], MMAN [39], TBCAA [40], and CoaCor [41]). NCS proposes an enhanced word embedding for a natural language query [10].…”
Section: Code Search Systemsmentioning
confidence: 99%
See 1 more Smart Citation
“…Several of the latest code search techniques that find code given a natural language query rely on machine learning techniques (e.g.,NCS [10], DeepCS [8], UNIF [38], MMAN [39], TBCAA [40], and CoaCor [41]). NCS proposes an enhanced word embedding for a natural language query [10].…”
Section: Code Search Systemsmentioning
confidence: 99%
“…This unified representation bridges the lexical gap between queries and source code resulting in relevant code fragments that do not necessarily contain query words. UNIF [38] is an extension of NCS that adds supervision to modify embeddings during training with the overall effect of improving the performance for code search. MMAN [39] is a Multi-Modal Attention Network for semantic source code retrieval.…”
Section: Code Search Systemsmentioning
confidence: 99%
“…We chose to sample 50 classes as this covers approximately 28% of the components available in Scikit-Learn and balanced the need for detailed manual annotation. For each query, we retrieved the top 10 API components based on: 1) our BM25 metric, 2) cosine similarity using averaged pre-trained neural embeddings (which have been shown to be effective for the related task of code search [6]), and 3) a uniform random metric. We used (2) to compare the use of BM25 with another unsupervised approach to semantic similarity.…”
Section: Rq2: Functionally Related Api Componentsmentioning
confidence: 99%
“…To compare AL and AMS, we consider the weak specification of Scikit-Learn components 6 : { L o g i s t i c R e g r e s s i o n , LinearSVC , S t a n d a r d S c a l e r } 6 names abbreviated for brevity and run experiments on our 9 datasets. We use 5-fold CV, pair pipelines between CV folds in order to appropriately perform comparisons after removing pipelines that don't satisfy the weak specification, and then compute wins on the paired pipelines.…”
Section: Rq4: Performance Of Strong Specificationsmentioning
confidence: 99%
See 1 more Smart Citation