philschmid RSS feed 10月13日 00:19
AI智能体架构演进:从浅层到深层
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了AI智能体(Agent)架构的演进,从最初的“浅层智能体”(Agent 1.0)到“深层智能体”(Agent 2.0)。浅层智能体依赖LLM的上下文窗口进行状态管理,适用于简单任务,但在处理多步骤、长时间任务时容易丢失上下文、陷入循环或产生幻觉。深层智能体则通过解耦规划与执行、引入持久化记忆、层级化委托(子智能体)以及精细的上下文工程,实现了更强大的复杂问题解决能力,能够进行显式规划、任务分解和状态管理,从而有效处理需要数日才能完成的任务。

💡 **浅层智能体(Agent 1.0)的局限性**:当前主流的AI智能体多采用“浅层”架构,即完全依赖LLM的上下文窗口(对话历史)作为其状态。这种架构简单,适用于“查询天气”等事务性任务,但当任务需要数十个步骤且耗时数天时,它们会因上下文溢出、目标丢失或缺乏恢复机制而失效,难以处理复杂长任务。

🚀 **深层智能体(Agent 2.0)的核心四大支柱**:深层智能体通过解耦规划与执行,并引入外部记忆,构建了更强大的架构。其核心在于:1. **显式规划**:使用工具创建和维护明确的任务计划(如Markdown文档),并在每一步后审查更新;2. **层级化委托**:通过“协调器→子智能体”模式,将复杂任务分配给专业化的子智能体处理;3. **持久化记忆**:利用文件系统或向量数据库作为信息源,取代完全依赖上下文窗口,实现“知道在哪里找信息”而非“记住一切”;4. **极致的上下文工程**:提供详细的指令,定义规划时机、子智能体调用协议、工具使用方法、文件命名规范等,以实现Agent 2.0的行为。

🧠 **从被动响应到主动架构的转变**:从浅层智能体到深层智能体的转变,不仅仅是连接LLM与更多工具,更是从反应式循环转变为主动式架构。通过实施显式规划、层级化委托和持久化记忆,深层智能体能够控制上下文,进而控制复杂性,解锁解决数小时甚至数天问题的能力,而不仅仅是数秒。

For the past year, building an AI agent usually meant one thing: setting up a while loop, take a user prompt, send it to an LLM, parse a tool call, execute the tool, send the result back, and repeat. This is what we call a Shallow Agent or Agent 1.0.

This architecture is fantastically simple for transactional tasks like "What's the weather in Tokyo and what should I wear?", but when asked to perform a task that requires 50 steps over three days, and they invariably get distracted, lose context, enter infinite loops, or hallucinates because the task requires too many steps for a single context window.

We are seeing an architectural shift towards Deep Agents or Agents 2.0. These systems do not just react in a loop. They combine agentic patterns to plan, manage a persistent memory/state, and delegate work to specialized sub-agents to solve multi-step, complex problems.

Agents 1.0: The Limits of the "Shallow" Loop

To understand where we are going, we must understand where we are. Most agents today are "shallow". This means rely entirely on the LLM's context window (conversation history) as their state.

    User Prompt: "Find the price of Apple stock and tell me if it's a good buy."LLM Reason: "I need to use a search tool."Tool Call: search("AAPL stock price")Observation: The tool returns data.LLM Answer: Generates a response based on the observation or calls another tool.Repeat: Loop until done.

This architecture is stateless and ephemeral. The agent's entire "brain" is within the context window. When a task becomes complex, e.g. "Research 10 competitors, analyze their pricing models, build a comparison spreadsheet, and write a strategic summary" it will fail due to:

    Context Overflow: The history fills up with tool outputs (HTML, messy data), pushing instructions out of the context window.Loss of Goal: Amidst the noise of intermediate steps, the agent forgets the original objective.No Recovery mechanism: If it goes down a rabbit hole, it rarely has the foresight to stop, backtrack, and try a new approach.

Shallow agents are great at tasks that take 5-15 steps. They are terrible at tasks that take 500.

The Architecture of Agents 2.0 (Deep Agents)

Deep Agents decouple planning from execution and manage memory external to the context window. The architecture consists of four pillars.

Pillar 1: Explicit Planning

Shallow agents plan implicitly via chain-of-thought ("I should do X, then Y"). Deep agents use tools to create and maintain an explicit plan, which can be To-Do list in a markdown document.

Between every step, the agent reviews and updates this plan, marking steps as pending, in_progress, or completed or add notes. If a step fails, it doesn't just retry blindly, it updates the plan to accommodate the failure. This keeps the agent focused on the high-level task.

Pillar 2: Hierarchical Delegation (Sub-Agents)

Complex tasks require specialization. Shallow Agents tries to be a jack-of-all-trades in one prompt. Deep Agents utilize an Orchestrator → Sub-Agent pattern.

The Orchestrator delegates task(s) to sub-agent(s) each with a clean context. The sub-agent (e.g., a "Researcher," a "Coder," a "Writer") performs its tool call loops (searching, erroring, retrying), compiles the final answer, and returns only the synthesized answer to the Orchestrator.

Pillar 3: Persistent Memory

To prevent context window overflow, Deep Agents utilize external memory sources, like filesystem or vector databases as their source of truth. Frameworks like Claude Code and Manus give agents read/write access to them. An agent writes intermediate results (code, draft text, raw data). Subsequent agents reference file paths or queries to only retrieve what is necessary. This shifts the paradigm from "remembering everything" to "knowing where to find information."

Pillar 4: Extreme Context Engineering

Smarter models do not require less prompting, they require better context. You cannot get Agent 2.0 behavior with a prompt that says, "You are a helpful AI.". Deep Agents rely on highly detailed instructions sometimes thousands of tokens long. These define:

    Identifying when to stop and plan before acting.Protocols for when to spawn a sub-agent vs. doing work themselves.Tool definitions and examples on how and when to use.Standards for file naming and directory structures.Strict formats for human-in-the-loop collaboration.

Visualizing a Deep Agent Flow

How do these pillars come together? Let's look at a sequence diagram for a Deep Agent handling a complex request: "Research Quantum Computing and write a summary to a filehttps://www.philschmid.de/static/blog/agents-2.0-deep-agents/sequence.png-deep-agents/sequence.png" alt="sequence">

Conclusion

Moving from Shallow Agents to Deep Agents (Agent 1.0 to Agent 2.0) isn't just about connecting an LLM to more tools. It is a shift from reactive loops to proactive architecture. It is about better engineering around the model.

Implementing explicit planning, hierarchical delegation via sub-agents, and persistent memory, allow us to control the context and by controlling the context, we control the complexity, unlocking the ability to solve problems that take hours or days, not just seconds.

Acknowledgements

This overview was created with the help of deep and manual research. The term “Deep Agents” was notably popularized by the LangChain team to describe this architectural evolution.


Thanks for reading! If you have any questions or feedback, please let me know on Twitter or LinkedIn.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI Agent Agent 1.0 Agent 2.0 Deep Agents Shallow Agents AI Architecture LLM Machine Learning Artificial Intelligence 人工智能 智能体 架构演进
相关文章