When Deep Learning Met Code Search

Cambronero, José; Li, Hongyu; Kim, Seohyun; Sen, Koushik; Chandra, Satish

doi:10.48550/arxiv.1905.03813

Cited by 5 publications

(12 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We use FastText [27], a widely adopted [16,28,22,12] word embedding technique in software engineering. We employ FastText over the contents file (generated in Step 5) to construct a skip-gram model with the following parameters: vector size=100 (as recommended in Fasttext tutorial 10 ), epoch = 20 (higher than the default of five epochs in Fasttext tutorial to ensure possibly more effectiveness), minimum size = 2, maximum size = 5 (empirically improved result, with one unit less than the tutorial, which mentioned that other languages could have other values), and finally, left the other parameters with default values. We adopt the skip-gram model over the CBOW model because it has been observed to be more efficient with subword information [27].…”

Section: ) Generate Contents' Filesmentioning

confidence: 99%

“…A number of studies on Information Retrieval leverages the crowd knowledge to help developers in software development [46,47,33,22,10,13,48,15,11]. Some works [47,33] employ traditional IR techniques such as TF-IDF to recommend relevant discussions from Stack Overflow for a given context.…”

Section: Related Workmentioning

confidence: 99%

“…Notwithstanding, they need to continuously reformulate the queries in order to improve the retrieved solutions [5,6]. Furthermore, other related work that aims at tackling the semantic gap also presents only the source code for the task [7,8,9,10,11,12] without their accompanying explanations, which are often very useful for reusing any such available source code. AnswerBot has the inverse limitation, it produces only textual answers where it could have included code snippets to provide a more understandable answer for the input query [13].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Improved Retrieval of Programming Solutions With Code Examples Using a Multi-featured Score

Silva¹,

Rahman²,

Dantas³

et al. 2021

Preprint

View full text Add to dashboard Cite

Developers often depend on code search engines to obtain solutions for their programming tasks. However, finding an expected solution containing code examples along with their explanations is challenging due to several issues. There is a vocabulary mismatch between the search keywords (the query) and the appropriate solutions. Semantic gap may increase for similar bag of words due to antonyms and negation. Moreover, documents retrieved by search engines might not contain solutions containing both code examples and their explanations. So, we propose CRAR (Crowd Answer Recommender) to circumvent those issues aiming at improving retrieval of relevant answers from Stack Overflow containing not only the expected code examples for the given task but also their explanations. Given a programming task, we investigate the effectiveness of combining information retrieval techniques along with a set of features to enhance the ranking of important threads (i.e., the units containing questions along with their answers) for the given task and then selects relevant answers contained in those threads, including semantic features, like word embeddings and sentence embeddings, for instance, a Convolutional Neural Network (CNN). CRAR also leverages social aspects of Stack Overflow discussions like popularity to select relevant answers for the tasks. Our experimental evaluation shows that the combination of the different features performs better than each one individually. We also compare the retrieval performance with the state-of-art CROK-AGE (Crowd Knowledge Answer Generator), which is also a system aimed at retrieving relevant answers from Stack Overflow. We show that CRAR outperforms CROKAGE in Mean Reciprocal Rank and Mean Recall with small and medium effect sizes, respectively.

show abstract

Section: ) Generate Contents' Filesmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Improved Retrieval of Programming Solutions With Code Examples Using a Multi-featured Score

Silva¹,

Rahman²,

Dantas³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…In addition to the code2vec and VarMisuse tasks that we address in this paper, we believe that adversarial examples can be applied to neural code search [15,30,44] -a developer can attract users to a specific library or an open-source project by introducing code that will be disproportionately highly ranked by a neural code search model. Defending against Adversarial Examples Pruthi et al [40] proposed an approach that is related to our defense.…”

Section: Adversarial Examples In Programsmentioning

confidence: 99%

“…Neural models of code have achieved state-of-the-art performance on various tasks such as prediction of variable names and types [1,5,11,42], code summarization [2,3,20], code generation [4,13,35], code search [15,30,44], and bug finding [39,43,46].…”

Section: Introductionmentioning

confidence: 99%

Adversarial Examples for Models of Code

Yefet¹,

Alon²,

Yahav³

2019

Preprint

View full text Add to dashboard Cite

Neural models of code have shown impressive performance for tasks such as predicting method names and identifying certain kinds of bugs. In this paper, we show that these models are vulnerable to adversarial examples, and introduce a novel approach for attacking trained models of code with adversarial examples. The main idea is to force a given trained model to make an incorrect prediction as specified by the adversary by introducing small perturbations that do not change the program's semantics. To find such perturbations, we present a new technique for Discrete Adversarial Manipulation of Programs (DAMP). DAMP works by deriving the desired prediction with respect to the model's inputs while holding the model weights constant, and following the gradients to slightly modify the code.To defend a model against such attacks, we propose placing a defensive model (Anti-DAMP) in front of it. Anti-DAMP detects unlikely mutations and masks them before feeding the input to the downstream model.We show that our DAMP attack is effective across three neural architectures: code2vec, GGNN, and GNN-FiLM, in both Java and C#. We show that DAMP has up to 89% success rate in changing a prediction to the adversary's choice ("targeted attack"), and a success rate of up to 94% in changing a given prediction to any incorrect prediction ("non-targeted attack"). By using Anti-DAMP, the success rate of the attack drops drastically for both targeted and non-targeted attacks, with a minor penalty of 2% relative degradation in accuracy while not performing under attack.

show abstract

Neural Code Search Evaluation Dataset

Li¹,

Kim²,

Chandra³

2019

Preprint

Self Cite

View full text Add to dashboard Cite

There has been an increase of interest in code search using natural language. Assessing the performance of such code search models can be difficult without a readily available evaluation suite. In this paper, we present an evaluation dataset consisting of natural language query and code snippet pairs, with the hope that future work in this area can use this dataset as a common benchmark. We also provide the results of two code search models ([6] and [1]) from recent work as a benchmark.

show abstract

When Deep Learning Met Code Search

Cited by 5 publications

References 0 publications

Improved Retrieval of Programming Solutions With Code Examples Using a Multi-featured Score

Improved Retrieval of Programming Solutions With Code Examples Using a Multi-featured Score

Adversarial Examples for Models of Code

Neural Code Search Evaluation Dataset

Contact Info

Product

Resources

About