Code comments are a key software component for program comprehension and software maintainability. High-quality code and comments are urgently needed by data-driven models widely used in tasks like code summarization. Many existing approaches for assessing the quality of comments are machine learning based classification algorithms or rely on heuristic rules. These approaches are difficult to capture the complicated features of text data and are often limited in accuracy, efficiency, and generalization ability. In this paper, we convert the quality assessment of code comments into a classification problem based on the multi-input neural network. We summarize the input, the code and comments, into vectors using the attention-based Bi-LSTM model and the weighted GloVe model, respectively, and concatenate the code vectors and the comment vectors as the input of the Multiple-Layer Perceptron classifier for the comment quality assessment. Experimental results show that our approach, in general, outperforms the previous technique, on both our labeled dataset and the public dataset, with the F1-score of 96.91% and 91.90%, respectively. Using the training set and the testing set from distinct sources, our approach can still achieve reasonable performance, which demonstrates its generalization ability.INDEX TERMS Code comment, source code, multi-input neural network, text classification.
Representation learning has shown impressive results for a multitude of tasks in software engineering. However, most researches still focus on a single problem. As a result, the learned representations cannot be applied to other problems and lack generalizability and interpretability. In this paper, we propose a Multi-task learning approach for representation learning across multiple downstream tasks of software engineering. From the perspective of generalization, we build a shared sequence encoder with a pretrained BERT for the token sequence and a structure encoder with a Tree-LSTM for the abstract syntax tree of code. From the perspective of interpretability, we integrate attention mechanism to focus on different representations and set learnable parameters to adjust the relationship between tasks. We also present the early results of our model. The learning process analysis shows our model has a significant improvement over strong baselines. CCS CONCEPTS • Computing methodologies → Artificial intelligence; • Software and its engineering → Software organization and properties.
As pre-trained models automate many code intelligence tasks, a widely used paradigm is to fine-tune a model on the task dataset for each programming language. A recent study reported that multilingual fine-tuning benefits a range of tasks and models. However, we find that multilingual fine-tuning leads to performance degradation on recent models UniXcoder and CodeT5.To alleviate the potentially catastrophic forgetting issue in multilingual models, we fix all pre-trained model parameters, insert the parameter-efficient structure adapter, and fine-tune it. Updating only 0.6% of the overall parameters compared to full-model fine-tuning for each programming language, adapter tuning yields consistent improvements on code search and summarization tasks, achieving state-of-the-art results. In addition, we experimentally show its effectiveness in cross-lingual and lowresource scenarios. Multilingual fine-tuning with 200 samples per programming language approaches the results fine-tuned with the entire dataset on code summarization. Our experiments on three probing tasks show that adapter tuning significantly outperforms full-model fine-tuning and effectively overcomes catastrophic forgetting.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.