MarkTechPost@AI 01月18日
Salesforce AI Research Proposes PerfCodeGen: A Training-Free Framework that Enhances the Performance of LLM-Generated Code with Execution Feedback
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Salesforce AI提出的PerfCodeGen框架,旨在提升大型语言模型(LLM)生成的代码的运行时效率。它通过执行反馈进行迭代自优化,无需额外训练。PerfCodeGen分两个阶段工作:首先通过单元测试确保代码的功能正确性,然后分析运行时指标,优化最耗时的测试用例。该框架与现有LLM工作流程集成,使用nucleus采样生成多个候选方案,通过反馈循环不断改进,最终生成既正确又高效的代码。在HumanEval、MBPP和APPS等基准测试中,PerfCodeGen显著提高了代码的运行时效率和正确性,甚至超越了部分人工编写的代码。

✅PerfCodeGen是一个无需训练的框架,通过执行反馈迭代优化LLM生成的代码,提升运行时效率,无需大量训练数据。

⚙️该框架分为两个阶段:首先通过单元测试确保代码功能正确,然后分析运行时指标优化性能瓶颈,优化最耗时的测试用例。

🚀PerfCodeGen在HumanEval、MBPP和APPS基准测试中展现了卓越性能,不仅提升了代码的正确率,还显著提高了运行效率,甚至超越了部分人工编写的代码。

💡该框架能与现有LLM工作流程无缝集成,通过nucleus采样生成多种候选方案,并利用反馈机制进行优化,适用于各种LLM和应用领域。

Large Language Models (LLMs) have become essential tools in software development, offering capabilities such as generating code snippets, automating unit tests, and debugging. However, these models often fall short in producing code that is not only functionally correct but also efficient in runtime. Overlooking runtime efficiency can lead to software that performs poorly, increases operational costs, and impacts user experience. This issue is particularly pronounced for less experienced developers, who may rely on AI-suggested code without fully understanding its implications. Salesforce Research addresses these challenges with PerfCodeGen, a framework that aims to improve both the correctness and performance of LLM-generated code.

Salesforce AI’s PerfCodeGen is a training-free framework designed to enhance the runtime efficiency of LLM-generated code. It achieves this by using execution feedback in an iterative self-refinement process. Unlike approaches requiring fine-tuning with extensive training data, PerfCodeGen employs a feedback loop that evaluates and refines code based on runtime metrics during test execution. The framework operates in two key phases: refining correctness and optimizing performance. Initially, it ensures the generated code meets functional requirements by addressing issues identified in unit tests. Once correctness is established, the framework focuses on runtime efficiency, optimizing the code by targeting and refining the most resource-intensive test cases. This iterative process results in solutions that are both correct and efficient.

Technical Insights and Benefits

PerfCodeGen integrates with existing LLM workflows and begins by generating multiple candidate solutions using nucleus sampling. In the first phase, these candidates are assessed for correctness through unit tests. Feedback from failed tests is used to refine the solutions. Once functional correctness is ensured, the framework moves to the second phase, analyzing runtime metrics to identify bottlenecks. This information is then used to optimize the code further, focusing on the most time-consuming test cases.

This two-phase process increases the likelihood of producing optimally efficient programs. PerfCodeGen’s methodology mirrors human debugging and optimization practices, making it both effective and intuitive. Additionally, the framework’s reliance on feedback rather than retraining allows it to scale across various LLMs and application domains. It has shown consistent improvements in runtime efficiency and correctness across models such as Phi-3-mini, Llama 3, and GPT-4.

PerfCodeGen has been tested on benchmarks such as HumanEval, MBPP, and APPS, demonstrating its effectiveness:

    Runtime Efficiency: On HumanEval, GPT-4’s optimization rate (%Opt) increased from 24.54% to 28.83% with PERFCODEGEN, with similar improvements observed across other models.Correctness Improvement: On MBPP, GPT-3.5’s correctness rate (%Correct) rose from 66.38% to 73.36% with a single sample (Best@1).Outperforming Ground Truth: PERFCODEGEN enabled LLMs to generate more efficient solutions than ground truth in approximately 55% of HumanEval tasks and 67% of MBPP tasks.Scalability: Open models such as Phi-3-mini and Mixtral achieved performance comparable to closed models like GPT-3.5 and GPT-4.

These results highlight PERFCODEGEN’s ability to balance correctness and runtime efficiency effectively, making it a valuable addition to LLM-driven code generation workflows.

Conclusion:

PerfCodeGen offers a practical solution to a key limitation of current LLMs: their focus on correctness at the expense of runtime efficiency. By incorporating execution feedback into an iterative refinement process, PerfCodeGen enables the generation of code that is both correct and efficient. This approach enhances the usability of LLMs in software development, providing developers with tools to produce higher-quality code without extensive retraining. The framework’s success across diverse benchmarks demonstrates its potential as a step forward in creating efficient, reliable, and accessible AI-driven programming solutions.


Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 65k+ ML SubReddit.

Recommend Open-Source Platform: Parlant is a framework that transforms how AI agents make decisions in customer-facing scenarios. (Promoted)

The post Salesforce AI Research Proposes PerfCodeGen: A Training-Free Framework that Enhances the Performance of LLM-Generated Code with Execution Feedback appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

PerfCodeGen LLM 代码优化 运行时效率 执行反馈
相关文章