Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Softw 2021
DOI: 10.1145/3468264.3468588
|View full text |Cite
|
Sign up to set email alerts
|

Reassessing automatic evaluation metrics for code summarization tasks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
37
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 50 publications
(39 citation statements)
references
References 47 publications
2
37
0
Order By: Relevance
“…We also emphasize that even the minor improvements provided here by multilingual training (which is broadly compatible with a range of settings) constitute a relevant and potentially widely useful result. Roy et al [58] have previously noted that small gains in BLEU-4 may not be perceptible to humans as increased text quality; nevertheless, we note that natural language translation (which is now widely used) attained high performance levels based on decades of incremental progress; this result and others below provide evidence that multilingual training could be an important step in the progress towards more useful automated tools. Finally, we note that BLEU-4 gains are higher for low-resource language (e.g., 17.7% for Ruby), and lower for high-resource languages (e.g., 2.5% for Python), as expected.…”
Section: Code Summarizationsupporting
confidence: 45%
See 4 more Smart Citations
“…We also emphasize that even the minor improvements provided here by multilingual training (which is broadly compatible with a range of settings) constitute a relevant and potentially widely useful result. Roy et al [58] have previously noted that small gains in BLEU-4 may not be perceptible to humans as increased text quality; nevertheless, we note that natural language translation (which is now widely used) attained high performance levels based on decades of incremental progress; this result and others below provide evidence that multilingual training could be an important step in the progress towards more useful automated tools. Finally, we note that BLEU-4 gains are higher for low-resource language (e.g., 17.7% for Ruby), and lower for high-resource languages (e.g., 2.5% for Python), as expected.…”
Section: Code Summarizationsupporting
confidence: 45%
“…We fine-tune with NVIDIA TITAN RTX, while Feng et al [18] use NVIDIA Tesla V100). (2) We use a pairwise two-sample statistical test (as described in [58], it is more precise than just comparing test-set summary statistics) to gauge differences. This requires a performance measurement for each test sample, which the repository did not include.…”
Section: Code Summarizationmentioning
confidence: 99%
See 3 more Smart Citations