Disentangled Code Representation Learning for Multiple Programming Languages

Zhang, Jingfeng; Hong, Haiwen; Yin, Zhong‐Ping; Wan, Yao; Liu, Ye; Sui, Yulei

doi:10.18653/v1/2021.findings-acl.391

Cited by 10 publications

(6 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Instead, DYPRO [75] and LIGER [76] learn program representations through dynamic executions, from the mixture of symbolic and concrete execution traces. Zhang et al [77] addressed the problem of code representation using a multi-language setting to create an embedding that separates the semantic from the context of the source code. FLOW2VEC [78] is an embedding approach that preserves interprocedural program dependence by approximating the high-order proximity.…”

Section: Deep Learning For Code Intelligencementioning

confidence: 99%

On the Effectiveness of Transfer Learning for Code Search

Salza

Schwizer

et al. 2023

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

The Transformer architecture and transfer learning have marked a quantum leap in natural language processing, improving the state of the art across a range of text-based tasks. This paper examines how these advancements can be applied to and improve code search. To this end, we pre-train a BERT-based model on combinations of natural language and source code data and fine-tune it on pairs of StackOverflow question titles and code answers. Our results show that the pre-trained models consistently outperform the models that were not pre-trained. In cases where the model was pre-trained on natural language "and" source code data, it also outperforms an information retrieval baseline based on Lucene. Also, we demonstrated that the combined use of an information retrieval-based approach followed by a Transformer leads to the best results overall, especially when searching into a large search pool. Transfer learning is particularly effective when much pre-training data is available and fine-tuning data is limited. We demonstrate that natural language processing models based on the Transformer architecture can be directly applied to source code analysis tasks, such as code search. With the development of Transformer models designed more specifically for dealing with source code data, we believe the results of source code analysis tasks can be further improved.

show abstract

Section: Deep Learning For Code Intelligencementioning

confidence: 99%

On the Effectiveness of Transfer Learning for Code Search

Salza

Schwizer

et al. 2023

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

show abstract

“…For this example, incorporating the caller/callee information may help boost the effectiveness. Beyond that, some more advanced static-analysis-based code embedding approaches (e.g., using data-flows and control-flows) [55], [65] would be beneficial to capture the code change semantics and thus deserve exploration in future studies.…”

Section: Future Directions For Comment Updatementioning

confidence: 99%

Predictive Comment Updating With Heuristics and AST-Path-Based Neural Learning: A Two-Phase Approach

Lin

Wang

Liu

et al. 2023

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

“…Wan et al 45 discussed the effectiveness of CodeBERT and GraphCodeBERT for preprocessing in code characterization, respectively. Zhang et al 46 separated the semantics from the syntax of the source code semantics from the syntax of the source code that obtains better interpretability and generality. Sui et al 47 precisely preserved the program dependencies (also known as value streams) between programs embedded in the control flow and alias-aware data flow of programs into a low-dimensional vector space by approximating higher-order proximity, and proposed the Flow2Vec model.…”

Section: Code Embeddingmentioning

confidence: 99%

Multilayer self‐attention residual network for code search

Liu

Zhang

2023

Concurrency and Computation

View full text Add to dashboard Cite

Software developers usually search existing code snippets in open source code repositories to modify and reuse them. Therefore, how to get the right code snippet from the open-source code repository quickly and accurately is the focus of current software development research. Nowadays, code search is one of the solutions. To improve the accuracy of source code feature information representation and the accuracy of code search. A multilayer self-attention residual network-based code search model (MSARN-CS) is proposed in this paper. In the MSARN-CS model, not only the weight of each word in the code sequence unit is considered but also the effect of embedding between code sequence units is calculated. In addition, an optimization model of residuals is introduced to compensate for the loss of information in the code sequences during the model training. To verify the search effectiveness of the MSARN-CS model, three other baseline models are compared on the basis of extensive source code data.The experimental results show that the MSARN-CS model has better search results compared with the baseline model. For parameter Recall@1, the experimental result of MSARN-CS model was 9.547, which as 100.90%, 73.87%, 60.37%, and 2.55% better compared to CODEnn, CRLCS, SAN-CS-and SAN-CS, respectively. For the parameter Recall@5, the results improved by 26.67%, 36.23%, 36.21%, and 1.63%, respectively, and for the parameter Recall@10, the results improved by 13.92%, 25.70%, 20.78%, and 2.23%, respectively. For the parameter mean reciprocal rank, the results improved by 52.89%, 76.17%, 63.38%, and 3.88%, respectively. For the parameter normalized discounted cumulative gain, the results improved by 54.22%, 60.55%, 50.28%, and 3.30%, respectively. The MSARN-CS model proposed in the paper can effectively improve the accuracy of code search and enhance the programming efficiency of developers.

show abstract

Disentangled Code Representation Learning for Multiple Programming Languages

Cited by 10 publications

References 29 publications

On the Effectiveness of Transfer Learning for Code Search

On the Effectiveness of Transfer Learning for Code Search

Predictive Comment Updating With Heuristics and AST-Path-Based Neural Learning: A Two-Phase Approach

Multilayer self‐attention residual network for code search

Contact Info

Product

Resources

About