cs.AI updates on arXiv.org 前天 13:20
RepoMasterEval:面向实际场景的代码补全模型评估基准
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文提出RepoMasterEval,一种基于真实代码库的代码补全模型评估基准,通过在代码片段中添加遮蔽和变异测试,以评估代码补全模型在实际开发中的性能。

arXiv:2408.03519v2 Announce Type: replace-cross Abstract: With the growing reliance on automated code completion tools in software development, the need for comprehensive evaluation benchmarks has become critical. Existing benchmarks focus more on code completion in function and class level by providing text descriptions to prompt the model. By contrast, such descriptive prompt is commonly unavailable in real development and code completion can occur in wider range of situations such as in the middle of a function or a code block. These limitations makes existing evaluation benchmarks poorly align with the practical scenarios of code completion tools. In this paper, we propose RepoMasterEval, a novel benchmark for evaluating code completion models constructed from real-world repositories. Each benchmark datum is generated by masking a code snippet (ground truth) from one source code file with existing test suites. To improve test accuracy of model generated code, we employ mutation testing to measure the effectiveness of the test cases and we manually crafted new test cases for those test suites with low mutation score. Our empirical evaluation on 10 state-of-the-art models shows that test argumentation is critical in improving the accuracy of the benchmark and RepoMasterEval is able to report variance in model performance in real-world scenarios. The deployment of RepoMasterEval also revealed that the benchmark is useful to give accurate feedback during model training and the score is in high correlation with the model's performance in practice.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

代码补全 模型评估 基准测试 代码库 变异测试
相关文章