VentureBeat 10月09日 07:36
研究提出ReasoningBank框架,赋能大语言模型智能记忆与提升
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

伊利诺伊大学厄巴纳-香槟分校与谷歌云AI研究的学者们开发了一种名为ReasoningBank的框架,能够让大语言模型(LLM)代理将其经验组织成记忆库,从而在复杂任务中不断进步。该框架能从代理的成功和失败尝试中提炼出“可泛化的推理策略”,并在推理时加以利用,避免重复错误,做出更优决策。研究表明,当与测试时扩展技术结合时,ReasoningBank能显著提升LLM代理的性能和效率,在Web浏览和软件工程基准测试中优于经典的记忆机制,为构建更具适应性和可靠性的企业级AI代理提供了实用路径。

💡 ReasoningBank框架的核心在于从代理的成功与失败经验中提炼出“可泛化的推理策略”,并将其结构化存储,以便在未来面对类似任务时能够复用。与以往仅记录原始交互日志或只存储成功案例的记忆机制不同,ReasoningBank能够从失败中汲取宝贵教训,提供可操作的、可泛化的指导,从而根本性地改变了代理的运作方式,使其不再是静态地孤立处理每个任务,而是能够回顾和适应过往的成功策略。

🧠 该框架利用LLM作为评判者(LLM-as-a-judge)来区分任务的成功与失败,无需人工标注。例如,当一个代理因搜索范围过广而未能找到目标商品时,ReasoningBank会提炼出“优化搜索查询”或“通过分类过滤来缩小产品范围”等策略,这些策略在未来处理类似任务时将极大提高成功率,有效避免了低效的试错过程,直接降低了运营成本并提升了用户体验。

🚀 ReasoningBank与测试时扩展(Test-Time Scaling)技术相结合,特别是其提出的Memory-aware Test-Time Scaling(MaTTS),能进一步提升代理性能。MaTTS通过并行或顺序生成多个推理路径,并进行比较和提炼,以识别一致的推理模式。这种“记忆驱动的经验扩展”形成了一个良性循环:现有的记忆指导代理寻找更优解决方案,而扩展产生的多样化经验则帮助代理创建更高质量的记忆,共同推动代理能力的持续进化和提升。

📈 在WebArena和SWE-Bench-Verified等基准测试中,ReasoningBank表现出显著的优势,在不同的大语言模型上均优于无记忆代理以及采用轨迹或工作流记忆机制的代理。它不仅提高了整体成功率,还能更好地泛化到更困难的跨领域任务,同时减少完成任务所需的交互步数。这为企业构建成本效益高、能够从经验中学习并适应的AI代理提供了实际可行的方法,预示着未来AI代理将具备更强的组合智能和自主管理复杂工作流的能力。

Researchers at the University of Illinois Urbana-Champaign and Google Cloud AI Research have developed a framework that enables large language model (LLM) agents to organize their experiences into a memory bank, helping them get better at complex tasks over time.

The framework, called ReasoningBank, distills “generalizable reasoning strategies” from an agent’s successful and failed attempts to solve problems. The agent then uses this memory during inference to avoid repeating past mistakes and make better decisions as it faces new problems. The researchers show that when combined with test-time scaling techniques, where an agent makes multiple attempts at a problem, ReasoningBank significantly improves the performance and efficiency of LLM agents.

Their findings show that ReasoningBank consistently outperforms classic memory mechanisms across web browsing and software engineering benchmarks, offering a practical path toward building more adaptive and reliable AI agents for enterprise applications.

The challenge of LLM agent memory

As LLM agents are deployed in applications that run for long periods, they encounter a continuous stream of tasks. One of the key limitations of current LLM agents is their failure to learn from this accumulated experience. By approaching each task in isolation, they inevitably repeat past mistakes, discard valuable insights from related problems, and fail to develop skills that would make them more capable over time.

The solution to this limitation is to give agents some kind of memory. Previous efforts to give agents memory have focused on storing past interactions for reuse by organizing information in various forms from plain text to structured graphs. However, these approaches often fall short. Many use raw interaction logs or only store successful task examples. This means they can't distill higher-level, transferable reasoning patterns and, crucially, they don’t extract and use the valuable information from the agent’s failures. As the researchers note in their paper, “existing memory designs often remain limited to passive record-keeping rather than providing actionable, generalizable guidance for future decisions.”

How ReasoningBank works

ReasoningBank is a memory framework designed to overcome these limitations. Its central idea is to distill useful strategies and reasoning hints from past experiences into structured memory items that can be stored and reused.

According to Jun Yan, a Research Scientist at Google and co-author of the paper, this marks a fundamental shift in how agents operate. "Traditional agents operate statically—each task is processed in isolation," Yan explained. "ReasoningBank changes this by turning every task experience (successful or failed) into structured, reusable reasoning memory. As a result, the agent doesn’t start from scratch with each customer; it recalls and adapts proven strategies from similar past cases."

The framework processes both successful and failed experiences and turns them into a collection of useful strategies and preventive lessons. The agent judges success and failure through LLM-as-a-judge schemes to obviate the need for human labeling.

Yan provides a practical example of this process in action. An agent tasked with finding Sony headphones might fail because its broad search query returns over 4,000 irrelevant products. "ReasoningBank will first try to figure out why this approach failed," Yan said. "It will then distill strategies such as ‘optimize search query’ and ‘confine products with category filtering.’ Those strategies will be extremely useful to get future similar tasks successfully done."

The process operates in a closed loop. When an agent faces a new task, it uses an embedding-based search to retrieve relevant memories from ReasoningBank to guide its actions. These memories are inserted into the agent’s system prompt, providing context for its decision-making. Once the task is completed, the framework creates new memory items to extract insights from successes and failures. This new knowledge is then analyzed, distilled, and merged into the ReasoningBank, allowing the agent to continuously evolve and improve its capabilities.

Supercharging memory with scaling

The researchers found a powerful synergy between memory and test-time scaling. Classic test-time scaling involves generating multiple independent answers to the same question, but the researchers argue that this “vanilla form is suboptimal because it does not leverage inherent contrastive signal that arises from redundant exploration on the same problem.”

To address this, they propose Memory-aware Test-Time Scaling (MaTTS), which integrates scaling with ReasoningBank. MaTTS comes in two forms. In “parallel scaling,” the system generates multiple trajectories for the same query, then compares and contrasts them to identify consistent reasoning patterns. In sequential scaling, the agent iteratively refines its reasoning within a single attempt, with the intermediate notes and corrections also serving as valuable memory signals.

This creates a virtuous cycle: the existing memory in ReasoningBank steers the agent toward more promising solutions, while the diverse experiences generated through scaling enable the agent to create higher-quality memories to store in ReasoningBank. 

“This positive feedback loop positions memory-driven experience scaling as a new scaling dimension for agents,” the researchers write.

ReasoningBank in action

The researchers tested their framework on WebArena (web browsing) and SWE-Bench-Verified (software engineering) benchmarks, using models like Google’s Gemini 2.5 Pro and Anthropic’s Claude 3.7 Sonnet. They compared ReasoningBank against baselines including memory-free agents and agents using trajectory-based or workflow-based memory frameworks.

The results show that ReasoningBank consistently outperforms these baselines across all datasets and LLM backbones. On WebArena, it improved the overall success rate by up to 8.3 percentage points compared to a memory-free agent. It also generalized better on more difficult, cross-domain tasks, while reducing the number of interaction steps needed to complete tasks. When combined with MaTTS, both parallel and sequential scaling further boosted performance, consistently outperforming standard test-time scaling.

This efficiency gain has a direct impact on operational costs. Yan points to a case where a memory-free agent took eight trial-and-error steps just to find the right product filter on a website. "Those trial and error costs could be avoided by leveraging relevant insights from ReasoningBank," he noted. "In this case, we save almost twice the operational costs," which also improves the user experience by resolving issues faster.

For enterprises, ReasoningBank can help develop cost-effective agents that can learn from experience and adapt over time in complex workflows and areas like software development, customer support, and data analysis. As the paper concludes, “Our findings suggest a practical pathway toward building adaptive and lifelong-learning agents.”

Yan confirmed that their findings point toward a future of truly compositional intelligence. For example, a coding agent could learn discrete skills like API integration and database management from separate tasks. "Over time, these modular skills... become building blocks the agent can flexibly recombine to solve more complex tasks," he said, suggesting a future where agents can autonomously assemble their knowledge to manage entire workflows with minimal human oversight.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

ReasoningBank 大语言模型 LLM Agents AI记忆 推理策略 机器学习 AI效率 企业级AI
相关文章