Semantic similarity metrics for evaluating source code summarization

Haque, Sakib; Eberhart, Zachary; Bansal, Aakash; McMillan, Collin

doi:10.1145/3524610.3527909

Cited by 37 publications

(17 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…We compare the predictions to reference code summaries from the repository. We use three metrics for this comparison: METEOR [7], USE [51], and BLEU [43]. While BLEU has traditionally been the most popular metric, it has fallen under controversy in SE literature on code summarization: [50] show evidence strongly favoring METEOR over BLEU for metrics based on word overlap, while [51] show similar evidence favoring USE as a semantic similarity metric over BLEU.…”

Section: Methodsmentioning

confidence: 99%

“…We use three metrics for this comparison: METEOR [7], USE [51], and BLEU [43]. While BLEU has traditionally been the most popular metric, it has fallen under controversy in SE literature on code summarization: [50] show evidence strongly favoring METEOR over BLEU for metrics based on word overlap, while [51] show similar evidence favoring USE as a semantic similarity metric over BLEU. Therefore, we use METEOR and USE as primary metrics for evaluation, but still report BLEU to conform with past practice.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Towards Modeling Human Attention from Eye Movements for Neural Source Code Summarization

Bansal

Sharif

McMillan

2023

Proc. ACM Hum.-Comput. Interact.

View full text Add to dashboard Cite

Neural source code summarization is the task of generating natural language descriptions of source code behavior using neural networks. A fundamental component of most neural models is an attention mechanism. The attention mechanism learns to connect features in source code to specific words to use when generating natural language descriptions. Humans also pay attention to some features in code more than others. This human attention reflects experience and high-level cognition well beyond the capability of any current neural model. In this paper, we use data from published eye-tracking experiments to create a model of this human attention. The model predicts which words in source code are the most important for code summarization. Next, we augment a baseline neural code summarization approach using our model of human attention. We observe an improvement in prediction performance of the augmented approach in line with other bio-inspired neural models.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Towards Modeling Human Attention from Eye Movements for Neural Source Code Summarization

Bansal

Sharif

McMillan

2023

Proc. ACM Hum.-Comput. Interact.

View full text Add to dashboard Cite

show abstract

“…Recent studies [52] have shown that BLEU does not correlate well with human judgement of source code comments. Roy et al [58] and Haque et al [59] have proposed METEOR and USE+c as alternatives that better correlate with human evaluation. METEOR [60] was introduced in 2005 to address the concerns of using BLEU [57] or ROUGE [61].…”

Section: Metricsmentioning

confidence: 99%

“…USE+c [59] is a new evaluation metric proposed for source code summarization. It differs from BLEU and METEOR because it does not focus on n-gram matching.…”

Section: Metricsmentioning

confidence: 99%

See 1 more Smart Citation

Label Smoothing Improves Neural Source Code Summarization

Haque¹,

Bansal²,

McMillan³

2023

Preprint

View full text Add to dashboard Cite

Label smoothing is a regularization technique for neural networks. Normally neural models are trained to an output distribution that is a vector with a single 1 for the correct prediction, and 0 for all other elements. Label smoothing converts the correct prediction location to something slightly less than 1, then distributes the remainder to the other elements such that they are slightly greater than 0. A conceptual explanation behind label smoothing is that it helps prevent a neural model from becoming "overconfident" by forcing it to consider alternatives, even if only slightly. Label smoothing has been shown to help several areas of language generation, yet typically requires considerable tuning and testing to achieve the optimal results. This tuning and testing has not been reported for neural source code summarization -a growing research area in software engineering that seeks to generate natural language descriptions of source code behavior. In this paper, we demonstrate the effect of label smoothing on several baselines in neural code summarization, and conduct an experiment to find good parameters for label smoothing and make recommendations for its use.

show abstract