2024
DOI: 10.4218/etrij.2023-0357
|View full text |Cite
|
Sign up to set email alerts
|

Framework for evaluating code generation ability of large language models

Sangyeop Yeo,
Yu‐Seung Ma,
Sang Cheol Kim
et al.

Abstract: Large language models (LLMs) have revolutionized various applications in natural language processing and exhibited proficiency in generating programming code. We propose a framework for evaluating the code generation ability of LLMs and introduce a new metric, , which captures the granularity of accuracy according to the pass rate of test cases. The framework is intended to be fully automatic to handle the repetitive work involved in generating prompts, conducting inferences, and executing the generated codes… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 17 publications
(22 reference statements)
0
1
0
Order By: Relevance
“…The sixth paper in this special issue [6], "Framework for evaluating code generation ability of large language models" by Yeo and others, introduces a systematic framework for evaluating the code generation capabilities of large language models and presents the derivation of a new metric called pass-rate@n, which captures granular accuracy levels by considering test pass rates. The experimental results demonstrate the effectiveness of the evaluation framework, which can be integrated with realworld coding platforms.…”
mentioning
confidence: 99%
“…The sixth paper in this special issue [6], "Framework for evaluating code generation ability of large language models" by Yeo and others, introduces a systematic framework for evaluating the code generation capabilities of large language models and presents the derivation of a new metric called pass-rate@n, which captures granular accuracy levels by considering test pass rates. The experimental results demonstrate the effectiveness of the evaluation framework, which can be integrated with realworld coding platforms.…”
mentioning
confidence: 99%