cs.AI updates on arXiv.org 07月23日
Dr. Boot: Bootstrapping Program Synthesis Language Models to Perform Repairing
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了程序合成语言模型在数据需求和合成过程上存在的问题,提出了一种新的自举算法,并展示了其在代码修复方面的优越性,同时指出了训练数据中存在的问题。

arXiv:2507.15889v1 Announce Type: cross Abstract: Language models for program synthesis are usually trained and evaluated on programming competition datasets (MBPP, APPS). However, these datasets are limited in size and quality, while these language models are extremely data hungry. Additionally, the language models have a misaligned program synthesis process compared to humans. While humans iteratively develop code with the help of a compiler, most program synthesis models currently produce code in one go. To solve these issues, we introduce a bootstrapping algorithm for program synthesis, that supports teaching models how to repair. We show that bootstrapping consistently outperforms regular fine-tuning. Compared to other work, our bootstrapped model performs on par with fine-tuned models that are 68\% larger. Notably, bootstrapping with repairing also improves non-repairing performance compared to regular bootstrapping during inference. However, on our models, repairing during inference is likely inferior to simply sampling the same number of solutions. Furthermore, we find that there are issues with the example test cases in the training portion of the APPS dataset that are valuable to the community, as many repairing and reinforcement learning methods rely on them.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

程序合成 语言模型 自举算法 代码修复 数据问题
相关文章