A deep neural network language model with contexts for source code

Nguyen, Anh Tuan; Nguyen, Trong Duc; Phan, Hung; Nguyen, Tien N.

doi:10.1109/saner.2018.8330220

Cited by 35 publications

(42 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Second, for n-gram baselines, because the next sequence is suggested by predicting next token one at a time, the accuracy of next sequence suggestion is affected by the confounding effect of the accuracy of a single nexttoken suggestion. The highest top-1 accuracy of an n-gram LM for next code token suggestion is about 0.5 [17]. Therefore, for predicting a next code sequence containing 6 tokens (on average), the maximum top-1 accuracy is 0.5 6 ≈ 1.6%.…”

Section: Resultsmentioning

confidence: 97%

“…4). These tokens are used to initiate a set of code sequences (lines 9-12) or concatenated with the current concretized code sequences to create the new ones (lines [14][15][16][17]. The process recursively continues until the end of the template.…”

Section: Concretizing Statement Templates and Ranking Code Candimentioning

confidence: 99%

“…White et al [26] use Recurrent Neural Network (RNN) to learn the context to predict the next token, while Dam et al [6] rely on LSTM. DNN4C incorporates syntactic information for better prediction using DNN LM [17].…”

mentioning

confidence: 99%

See 2 more Smart Citations

Combining Program Analysis and Statistical Language Model for Code Statement Completion

Nguyen

et al. 2019

2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE)

View full text Add to dashboard Cite

Automatic code completion helps improve developers' productivity in their programming tasks. A program contains instructions expressed via code statements, which are considered as the basic units of program execution. In this paper, we introduce AUTOSC, which combines program analysis and the principle of software naturalness to fill in a partially completed statement. AUTOSC benefits from the strengths of both directions, in which the completed code statement is both frequent and valid. AUTOSC is first trained on a large code corpus to derive the templates of candidate statements. Then, it uses program analysis to validate and concretize the templates into syntactically and type-valid candidate statements. Finally, these candidates are ranked by using a language model trained on the lexical form of the source code in the code corpus. Our empirical evaluation on the large datasets of real-world projects shows that AUTOSC achieves 38.9-41.3% top-1 accuracy and 48.2-50.1% top-5 accuracy in statement completion. It also outperforms a state-of-the-art approach from 9X-69X in top-1 accuracy.

show abstract

Section: Resultsmentioning

confidence: 97%

Section: Concretizing Statement Templates and Ranking Code Candimentioning

confidence: 99%

See 1 more Smart Citation

Combining Program Analysis and Statistical Language Model for Code Statement Completion

Nguyen

et al. 2019

2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE)

View full text Add to dashboard Cite

show abstract

“…A major limitation of their works is that they consider source code as simple tokens of text and ignores the contextual, syntaxtual and structural dependencies. The most similar work to ours is DNN [34], however it varies in several important ways. They apply deep neural networks for source code modeling with a fixed size of context, which can only suggest the next code token, whereas our work can generate whole sequence of source code and consider variable size context.…”

Section: Related Workmentioning

confidence: 99%

CodeGRU: Context-aware deep learning with gated recurrent unit for source code modeling

Hussain

Huang

Zhou

et al. 2020

Information and Software Technology

View full text Add to dashboard Cite

Recently many NLP-based deep learning models have been applied to model source code for source code suggestion and recommendation tasks. A major limitation of these approaches is that they take source code as simple tokens of text and ignore its contextual, syntaxtual and structural dependencies. In this work, we present CodeGRU, a Gated Recurrent Unit based source code language model that is capable of capturing contextual, syntaxtual and structural dependencies for modeling the source code. The CodeGRU introduces the following several new components. The Code Sampler is first proposed for selecting noise-free code samples and transforms obfuscate code to its proper syntax, which helps to capture syntaxtual and structural dependencies. The Code Regularize is next introduced to encode source code which helps capture the contextual dependencies of the source code. Finally, we propose a novel method which can learn variable size context for modeling source code. We evaluated CodeGRU with real-world dataset and it shows that CodeGRU can effectively capture contextual, syntaxtual and structural dependencies which previous works fails. We also discuss and visualize two use cases of CodeGRU for source code modeling tasks (1) source code suggestion, and (2) source code generation.

show abstract

“…For the training and testing of the proposed method, this work used the dataset anticipated in [5,6]. The dataset comprises of ten java projects (ant, cassandra, db40, jgit, poi, batik, antlr, itext, jts, maven).…”

Section: Datasetmentioning

confidence: 99%

DeepVS: an efficient and generic approach for source code modelling usage

et al. 2020

View full text Add to dashboard Cite

Recently deep learning-based approaches have shown great potential in the modeling of source code for various software engineering tasks. These techniques lack adequate generalization and resistance to acclimate the use of such models in a realworld software development environment. In this work, we propose a novel general framework that combines cloud computing and deep learning in an integrated development environment (IDE) to assist software developers in various source code modeling tasks. Additionally, we present DeepVS, an end-to-end deep learning-based source code suggestion tool that shows a real-world implementation of our proposed framework. DeepVS tool is capable of providing source code suggestions instantly in an IDE by using a pre-trained source code model. Moreover, the DeepVS tool is also capable of suggesting zero-day (unseen) code tokens. The DeepVS tool illustrates the effectiveness of the proposed framework and shows how it can help to link the gap between developers and researchers.

show abstract

A deep neural network language model with contexts for source code

Cited by 35 publications

References 43 publications

Combining Program Analysis and Statistical Language Model for Code Statement Completion

Combining Program Analysis and Statistical Language Model for Code Statement Completion

CodeGRU: Context-aware deep learning with gated recurrent unit for source code modeling

DeepVS: an efficient and generic approach for source code modelling usage

Contact Info

Product

Resources

About