MarkTechPost@AI 08月14日
ByteDance Unveils ToolTrain: A New Tool-Integrated Reinforcement Learning RL Framework that Redefines Repo Deep Search
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章介绍了一种名为ToolTrain的新型工具集成训练框架,旨在增强大型语言模型(LLM)在软件问题定位中的多跳推理能力。该框架通过引入RepoSearcher代理,并结合了SFT(监督微调)和RL(强化学习)的训练方法,使LLM能够更有效地利用检索工具定位函数或类定义。研究人员利用真实的GitHub问题和专业开发者的验证数据构建评估数据集,并在多个先进模型和框架上进行了对比测试。结果表明,ToolTrain训练的模型在代码定位任务上取得了领先性能,尤其是在同等模型规模下,甚至能超越一些更大的商业模型,展现了其在优化工具使用和提高推理效率方面的潜力。

🎯 **ToolTrain框架提升LLM代码定位能力**:该框架专注于增强大型语言模型(LLM)在软件问题定位中的多跳推理能力,通过整合工具使用训练,使LLM能够更精准地识别需要修改的代码位置。

🔍 **RepoSearcher代理与双阶段训练**:ToolTrain引入了RepoSearcher这一轻量级代理,它配备简单的检索工具,能够通过名称定位函数或类定义。训练过程采用两阶段方法:首先是基于拒绝采样的SFT,然后是工具集成的RL,以确保模型能策略性地使用工具,避免无效探索,聚焦于有前景的代码路径。

📊 **基于真实数据的严格评估**:研究人员使用SWE-Bench-Verified这一由真实GitHub问题和人工验证金牌补丁派生出的基准数据集进行评估。该数据集提供了精确的代码定位的地面真实答案,通过Recall@k、MAP、MRR等多种指标衡量RepoSearcher的性能。

🚀 **ToolTrain的卓越性能表现**:实验结果显示,ToolTrain训练的模型(如RepoSearcher with ToolTrain-32B)在代码定位任务上达到了行业领先水平,甚至在特定指标上超越了如Claude-3.7-Sonnet这样更大的商业模型,并且在同等模型规模下,7B参数的模型表现优于其他使用32B模型的框架,显著提升了小模型的工具调用能力。

💡 **优化工具使用与推理效率**:ToolTrain通过结合SFT和RL,使模型能够更有效地导航代码仓库并进行精确的多跳推理。这不仅减少了冗余的探索,提高了问题定位的效率,也为解决复杂的软件挑战提供了一个高效的解决方案,尤其是在优化小型模型工具使用方面展现出巨大潜力。

Issue localization involves identifying exact code locations that require modification to fix software problems, a process that often demands significant manual effort from developers, especially in large repositories. Due to its complexity and time-intensive nature, automating this task has become a key research focus. LLM-based agents enable language models to use various tools for dynamic repository exploration. However, these models face challenges in performing Repo Deep Search, a sequential navigation task that requires multi-step reasoning and effective tool usage. Current LLMs struggle with these high demands, often resulting in incorrect tool calls or a breakdown in maintaining coherent reasoning chains during the exploration process.

Existing work includes fault localization and agentic training. In fault localization, methods like DeepFL and DeepRL4FL utilize deep neural networks and CNNs to identify faulty code by analyzing test coverage, data dependencies, and static code representations. More recent advancements include LLMs, such as Agentless, to narrow down code locations. LLMs often lack the complexity needed for complex reasoning and tool usage in repository exploration. To address this, agentic training methods, such as SWE-Gym and SEAlign, fine-tune LLMs using high-quality trajectories. Another approach, LocAgent, constructs the ground truth for issue localization based on functions modified by golden patches from GitHub.

Researchers from Peking University, ByteDance, and Beijing Institute of Technology have proposed ToolTrain, a tool-integrated training framework to enhance the multi-hop reasoning capabilities of LLMs during issue localization. ToolTrain introduces RepoSearcher, a lightweight agent equipped with simple retrieval tools that enable LLMs to locate function or class definitions by name. To help the LLMs use these tools for multi-hop reasoning, the researchers construct labeled data from open-source repositories and follow a two-stage process: rejection-sampled SFT and tool-integrated RL. This approach ensures the model learns to use tools strategically, avoiding redundant explorations while focusing on promising code paths.

Researchers construct their evaluation dataset using SWE-Bench-Verified, a benchmark derived from real GitHub issues and manually verified by professional developers. This dataset provides ground-truth answers for issue localization by identifying functions and files modified in golden patches. To evaluate RepoSearcher’s performance, metrics such as Recall@k, MAP, MRR, nDCG@k, and %Resolved are used. Moreover, ToolTrain is applied to two models, Qwen-7B and Qwen-32B, which are then compared against four state-of-the-art frameworks: Agentless, CrcaLoca, CoSIL, and LocAgent. These baselines represent diverse design philosophies, ensuring a detailed evaluation of ToolTrain’s effectiveness for precise and strategic code exploration.

The RepoSearcher with ToolTrain achieves state-of-the-art performance among models of similar size and even outperforms larger commercial models on specific metrics. For instance, RepoSearcher with ToolTrain-32B achieves a function-level Recall@5 score of 68.55, surpassing Claude-3.7-Sonnet (66.38). The 7B-parameter model outperforms other frameworks using 32B models, enhancing the tool-calling capabilities of ToolTrain in smaller models. In issue resolution, RepoSearcher with ToolTrain-7B achieves a Recall@5 of 62.38 and a resolution rate of 14.00, the best among 7B models. However, disparities arise when using different patch generation models, as seen in the resolution rates of 14.00 (ToolTrain-7B) versus 31.60 (ToolTrain-32B), despite similar localization results.

In conclusion, researchers introduced ToolTrain to enhance the issue localization of LLMs. By combining SFT with RL, ToolTrain equips models like RepoSearcher to navigate code repositories effectively and perform precise multi-hop reasoning. Evaluated on real-world benchmarks, ToolTrain-trained models achieve state-of-the-art performance among similarly sized models and even outperform larger commercial models like Claude-3.7 on specific tasks. This shows its ability to optimize tool usage and reasoning in smaller models, reducing redundancy and improving efficiency. The study emphasizes the potential of ToolTrain to transform issue localization and provide an efficient solution for complex software challenges.


Check out the Paper and GitHub Page here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post ByteDance Unveils ToolTrain: A New Tool-Integrated Reinforcement Learning RL Framework that Redefines Repo Deep Search appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

ToolTrain LLM 软件问题定位 代码定位 Repo Deep Search
相关文章