Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension 2022
DOI: 10.1145/3524610.3527909
|View full text |Cite
|
Sign up to set email alerts
|

Semantic similarity metrics for evaluating source code summarization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
14
0
3

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 37 publications
(17 citation statements)
references
References 22 publications
0
14
0
3
Order By: Relevance
“…We compare the predictions to reference code summaries from the repository. We use three metrics for this comparison: METEOR [7], USE [51], and BLEU [43]. While BLEU has traditionally been the most popular metric, it has fallen under controversy in SE literature on code summarization: [50] show evidence strongly favoring METEOR over BLEU for metrics based on word overlap, while [51] show similar evidence favoring USE as a semantic similarity metric over BLEU.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We compare the predictions to reference code summaries from the repository. We use three metrics for this comparison: METEOR [7], USE [51], and BLEU [43]. While BLEU has traditionally been the most popular metric, it has fallen under controversy in SE literature on code summarization: [50] show evidence strongly favoring METEOR over BLEU for metrics based on word overlap, while [51] show similar evidence favoring USE as a semantic similarity metric over BLEU.…”
Section: Methodsmentioning
confidence: 99%
“…We use three metrics for this comparison: METEOR [7], USE [51], and BLEU [43]. While BLEU has traditionally been the most popular metric, it has fallen under controversy in SE literature on code summarization: [50] show evidence strongly favoring METEOR over BLEU for metrics based on word overlap, while [51] show similar evidence favoring USE as a semantic similarity metric over BLEU. Therefore, we use METEOR and USE as primary metrics for evaluation, but still report BLEU to conform with past practice.…”
Section: Methodsmentioning
confidence: 99%
“…Recent studies [52] have shown that BLEU does not correlate well with human judgement of source code comments. Roy et al [58] and Haque et al [59] have proposed METEOR and USE+c as alternatives that better correlate with human evaluation. METEOR [60] was introduced in 2005 to address the concerns of using BLEU [57] or ROUGE [61].…”
Section: Metricsmentioning
confidence: 99%
“…USE+c [59] is a new evaluation metric proposed for source code summarization. It differs from BLEU and METEOR because it does not focus on n-gram matching.…”
Section: Metricsmentioning
confidence: 99%
See 1 more Smart Citation