When deep learning met code search

Cambronero, José; Li, Hongyu; Kim, Seohyun; Sen, Koushik; Chandra, Satish

doi:10.1145/3338906.3340458

Cited by 183 publications

(162 citation statements)

References 27 publications

Supporting

Mentioning

146

Contrasting

Order By: Relevance

“…Several of the latest code search techniques that find code given a natural language query rely on machine learning techniques (e.g.,NCS [10], DeepCS [8], UNIF [38], MMAN [39], TBCAA [40], and CoaCor [41]). NCS proposes an enhanced word embedding for a natural language query [10].…”

Section: Code Search Systemsmentioning

confidence: 99%

“…This unified representation bridges the lexical gap between queries and source code resulting in relevant code fragments that do not necessarily contain query words. UNIF [38] is an extension of NCS that adds supervision to modify embeddings during training with the overall effect of improving the performance for code search. MMAN [39] is a Multi-Modal Attention Network for semantic source code retrieval.…”

Section: Code Search Systemsmentioning

confidence: 99%

See 1 more Smart Citation

FACER: An API Usage-based Code-example Recommender for Opportunistic Reuse

Abid

Shamail

Basit

et al. 2021

Preprint

View full text Add to dashboard Cite

To save time, developers often search for code examples that implement their desired software features. Existing code search techniques typically focus on ﬁnding code snippets for a single given query, which means that developers need to perform a separate search for each desired functionality. In this paper, we pro-pose FACER (Feature-driven API usage-based Code Examples Recommender), a technique that avoids repeated searches through opportunistic reuse. Speciﬁcally, given the selected code snippet that matches the initial search query, FACER ﬁnds and suggests related code snippets that represent features that the developer may want to implement next. FACER ﬁrst constructs a code fact repository by parsing the source code of open-source Java projects to obtain methods’ textual information, call graphs, and Application Programming Interface (API) usages. It then detects unique features by clustering methods based on similar API us-ages, where each cluster represents a feature or functionality. Finally, it detects frequently co-occurring features across projects using frequent pattern mining and recommends related methods from the mined patterns. To evaluate FACER, we run it on 120 Java Android apps from GitHub. We ﬁrst manually validate that the detected method clusters represent methods with similar functionality. We then perform an automated evaluation to determine the best parameters (e.g., similarity threshold) for FACER. We recruit 10 professional developers along with 39 experienced students to judge FACER’s recommendation of related methods. Our results show that, on average, FACER’s recommendations are 80% precise. We also survey a total of 20 professional Android and Java developers to understand their code search and reuse experiences, and also to obtain their feedback on the usability and usefulness of FACER. The survey results show that 95% of our surveyed professional developers ﬁnd the idea of related method recommendations useful during code reuse.

show abstract

Section: Code Search Systemsmentioning

confidence: 99%

Section: Code Search Systemsmentioning

confidence: 99%

FACER: An API Usage-based Code-example Recommender for Opportunistic Reuse

Abid

Shamail

Basit

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…We chose to sample 50 classes as this covers approximately 28% of the components available in Scikit-Learn and balanced the need for detailed manual annotation. For each query, we retrieved the top 10 API components based on: 1) our BM25 metric, 2) cosine similarity using averaged pre-trained neural embeddings (which have been shown to be effective for the related task of code search [6]), and 3) a uniform random metric. We used (2) to compare the use of BM25 with another unsupervised approach to semantic similarity.…”

Section: Rq2: Functionally Related Api Componentsmentioning

confidence: 99%

“…To compare AL and AMS, we consider the weak specification of Scikit-Learn components 6 : { L o g i s t i c R e g r e s s i o n , LinearSVC , S t a n d a r d S c a l e r } 6 names abbreviated for brevity and run experiments on our 9 datasets. We use 5-fold CV, pair pipelines between CV folds in order to appropriately perform comparisons after removing pipelines that don't satisfy the weak specification, and then compute wins on the paired pipelines.…”

Section: Rq4: Performance Of Strong Specificationsmentioning

confidence: 99%

“…At a smaller scale, automated example extraction [24] from specific scripts allows allows users to produce minimal working snippets. Large-scale corpora that exercise particular APIs can be used to mine preconditions for method calls [33], order-based specifications for chaining calls [2], code repair patterns [31], and drive semantic code search [6,25,30].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

AMS: generating AutoML search spaces from weak specifications

Cambronero

Cito

Rinard

2020

Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Softw

View full text Add to dashboard Cite

Transformer‐based code search for software Q&A sites

Peng

Xie

et al. 2022

J Software Evolu Process

View full text Add to dashboard Cite

In software Q&A sites, there are many code‐solving examples of individual program problems, and these codes with explanatory natural language descriptions are easy to understand and reuse. Code search in software Q&A sites increases the productivity of developers. However, previous approaches to code search fail to capture structural code information and the interactivity between source codes and natural queries. In other words, most of them focus on specific code structures only. This paper proposes TCS (Transformer‐based code search), a novel neural network, to catch structural information for searching valid source codes from the query, which is vital for code search. The multi‐head attention mechanism in Transformer helps TCS learn enough information about the underlying semantic vector representation of codes and queries. An aligned attention matrix is also employed to catch relationships between codes and queries. Experimental results show that the proposed TCS can learn more structural information and has better performance than existing models.

show abstract

When deep learning met code search

Cited by 183 publications

References 27 publications

FACER: An API Usage-based Code-example Recommender for Opportunistic Reuse

FACER: An API Usage-based Code-example Recommender for Opportunistic Reuse

AMS: generating AutoML search spaces from weak specifications

Transformer‐based code search for software Q&A sites

Contact Info

Product

Resources

About