MarkTechPost@AI 10月09日
AgentFlow:一种创新的可训练智能体框架
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

AgentFlow是一个新颖的可训练智能体框架,它将复杂的任务分解为四个核心模块:规划器、执行器、验证器和生成器。这些模块通过显式内存和工具集进行协调。其独特的训练方法Flow-GRPO,将长时序、稀疏奖励的强化学习问题转化为单轮更新,通过在每个回合广播轨迹级别的奖励信号,并结合基于令牌的PPO风格更新和KL正则化,实现了高效的规划器优化。在十个基准测试中,使用7B模型并通过Flow-GRPO微调的AgentFlow在搜索、智能体推理、数学和科学任务上均取得了显著的性能提升,甚至超越了强大的基线模型和GPT-4o。

🗂️ **模块化智能体架构**:AgentFlow将智能体设计为包含规划器、执行器、验证器和生成器四个独立但协同工作的模块,并辅以显式内存和工具集。这种模块化设计使得训练过程更加聚焦,仅需对规划器进行迭代优化,而其他模块可作为固定的引擎使用,提高了系统的灵活性和可维护性。

🚀 **Flow-GRPO训练方法**:该框架引入了Flow-GRPO(Flow-based Group Refined Policy Optimization)这一创新的强化学习训练方法,专门解决长时序、稀疏奖励优化问题。它通过将最终的轨迹级别奖励信号广播至每一个回合,确保了局部规划步骤与全局任务成功对齐,同时采用令牌级别的PPO风格目标函数和组归一化优势来稳定训练过程,减少方差。

📈 **显著的性能提升**:在包括知识密集型搜索、智能体推理、数学和科学在内的十个不同基准测试中,AgentFlow展示了卓越的性能。使用一个7B参数的模型,通过Flow-GRPO训练后,在搜索、智能体推理、数学任务上分别取得了平均14.9%、14.0%和14.5%的性能提升,科学任务上也有4.1%的提升,并且研究团队声称其7B系统在测试集上超越了GPT-4o。

🛠️ **工具使用可靠性增强**:AgentFlow不仅提升了整体任务完成度,还在工具使用方面表现出色。研究表明,该框架能够显著减少工具调用错误,例如在GAIA基准测试中错误率降低高达28.4%。随着回合数的增加和模型规模的增大,工具使用的准确性和规划质量均呈现积极的增长趋势。

TL;DR: AgentFlow is a trainable agent framework with four modules—Planner, Executor, Verifier, Generator—coordinated by an explicit memory and toolset. The planner is optimized in the loop with a new on-policy method, Flow-GRPO, which broadcasts a trajectory-level outcome reward to every turn and applies token-level PPO-style updates with KL regularization and group-normalized advantages. On ten benchmarks, a 7B backbone tuned with Flow-GRPO reports +14.9% (search), +14.0% (agentic), +14.5% (math), and +4.1% (science) over strong baselines.

What is AgentFlow?

AgentFlow formalizes multi-turn, tool-integrated reasoning as an Markov Decision Process (MDP). At each turn, the Planner proposes a sub-goal and selects a tool plus context; the Executor calls the tool; the Verifier signals whether to continue; the Generator emits the final answer on termination. A structured, evolving memory records states, tool calls, and verification signals, constraining context growth and making trajectories auditable. Only the planner is trained; other modules can be fixed engines.

The public implementation showcases a modular toolkit (e.g., base_generator, python_coder, google_search, wikipedia_search, web_search) and ships quick-start scripts for inference, training, and benchmarking. The repository is MIT-licensed.

https://arxiv.org/pdf/2510.05592

Training method: Flow-GRPO

Flow-GRPO (Flow-based Group Refined Policy Optimization) converts long-horizon, sparse-reward optimization into tractable single-turn updates:

https://arxiv.org/pdf/2510.05592

Understanding the results and benchmarks

Benchmarks. The research team evaluates four task types: knowledge-intensive search (Bamboogle, 2Wiki, HotpotQA, Musique), agentic reasoning (GAIA textual split), math (AIME-24, AMC-23, Game of 24), and science (GPQA, MedQA). GAIA is a tooling-oriented benchmark for general assistants; the textual split excludes multimodal requirements.

Main numbers (7B backbone after Flow-GRPO). Average gains over strong baselines: +14.9% (search), +14.0% (agentic), +14.5% (math), +4.1% (science). The research team state their 7B system surpasses GPT-4o on the reported suite. The project page also reports training effects such as improved planning quality, reduced tool-calling errors (up to 28.4% on GAIA), and positive trends with larger turn budgets and model scale.

Ablations. Online Flow-GRPO improves performance by +17.2% vs. a frozen-planner baseline, while offline supervised fine-tuning of the planner degrades performance by −19.0% on their composite metric.

https://arxiv.org/pdf/2510.05592

Key Takeaways

Editorial Comments

AgentFlow formalizes tool-using agents into four modules (planner, executor, verifier, generator) and trains only the planner in-loop via Flow-GRPO, which broadcasts a single trajectory-level reward to every turn with token-level PPO-style updates and KL control. Reported results on ten benchmarks show average gains of +14.9% (search), +14.0% (agentic/GAIA textual split), +14.5% (math), and +4.1% (science); the research team additionally state the 7B system surpasses GPT-4o on this suite. Implementation, tools, and quick-start scripts are MIT-licensed in the GitHub repo.


Check out the Technical PaperGitHub Page and Project Page. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The post Stanford Researchers Released AgentFlow: In-the-Flow Reinforcement Learning RL for Modular, Tool-Using AI Agents appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AgentFlow 智能体框架 强化学习 Flow-GRPO AI Agents Reinforcement Learning Modular AI
相关文章