Anthropic Engineering 09月30日
Claude Agent SDK:赋能开发者构建通用型AI智能体
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Anthropic发布了Claude Agent SDK,该SDK最初名为Claude Code,旨在支持开发者生产力。如今,它已超越单纯的代码工具,可用于深度研究、视频创作、笔记记录等多种非编码应用。Claude Agent SDK的核心在于赋予Claude访问计算机的能力,使其能够执行文件操作、运行命令、调试代码等。通过集成工具、Bash脚本、代码生成和MCP(模型上下文协议)等功能,开发者可以构建出金融、个人助理、客户支持和深度研究等多种类型的智能体。SDK强调了“收集上下文->采取行动->验证工作->重复”的智能体循环,并提供了定义规则、视觉反馈和LLM作为评判等验证机制,帮助用户构建更可靠、更易于迭代的AI智能体。

💡 **Claude Agent SDK的演进与通用性**:最初的Claude Code SDK现已更名为Claude Agent SDK,标志着其从单一的编码解决方案向更广泛的通用AI智能体开发平台转型。它赋予Claude访问计算机的能力,使其能够执行文件查找、编辑、代码运行、调试等操作,从而支持包括深度研究、视频制作、笔记整理在内的多样化非编码任务,展现了其强大的通用性。

💻 **构建智能体的核心机制与工具**:Claude Agent SDK通过提供一系列核心机制来赋能开发者构建智能体。这包括赋予Claude访问用户计算机的能力(通过终端),使其能够使用Bash命令、编辑和创建文件。此外,SDK还支持自定义工具、Bash脚本、代码生成以及MCP(模型上下文协议),后者能够标准化地集成Slack、GitHub、Google Drive等外部服务,大大简化了AI智能体的外部连接和数据交互。

🔄 **智能体工作循环与验证策略**:SDK强调了“收集上下文 -> 采取行动 -> 验证工作 -> 重复”的智能体工作循环。为了确保智能体的可靠性,SDK提供了多种验证机制,包括明确定义的规则(如代码检查、邮箱地址验证)、视觉反馈(如UI生成的截图验证)以及使用另一个LLM作为评判者。这些机制有助于智能体在迭代过程中自我纠错和改进,从而提升整体性能。

🚀 **多样化的智能体应用场景**:Claude Agent SDK为开发者提供了构建各类先进智能体的能力。例如,可以构建能够理解投资组合并提供建议的金融智能体;能够处理旅行预订和日程管理的个人助理智能体;能够处理复杂客户请求的客户支持智能体;以及能够跨大量文档进行深度研究并生成报告的研究智能体。SDK的核心在于提供构建自动化工作流所需的底层能力。

Last year, we shared lessons in building effective agents alongside our customers. Since then, we've released Claude Code, an agentic coding solution that we originally built to support developer productivity at Anthropic.

Over the past several months, Claude Code has become far more than a coding tool. At Anthropic, we’ve been using it for deep research, video creation, and note-taking, among countless other non-coding applications.

In other words, the agent harness that powers Claude Code (the Claude Code SDK) can power many other types of agents, too. To reflect this broader vision, we're renaming the Claude Code SDK to the Claude Agent SDK.

In this post, we'll highlight why we built the Claude Agent SDK, how to build your own agents with it, and share the best practices that have emerged from our team’s own deployments.

Giving Claude a computer

The key design principle behind Claude Code is that Claude needs the same tools that programmers use every day. It needs to be able to find appropriate files in a codebase, write and edit files, lint the code, run it, debug, edit, and sometimes take these actions iteratively until the code succeeds.

We found that by giving Claude access to the user’s computer (via the terminal), it had what it needed to write code like programmers do.

But this has also made Claude in Claude Code effective at non-coding tasks. By giving it tools to run bash commands, edit files, create files and search files, Claude can read CSV files, search the web, build visualizations, interpret metrics, and do all sorts of other digital work – in short, create general-purpose agents with a computer.

Creating new types of agents

We believe giving Claude a computer unlocks the ability to build agents that were not as effective as before. For example, with our SDK, developers can build:

And much more. At its core, the SDK gives you the primitives to build agents for whatever workflow you're trying to automate.

Building your agent loop

In Claude Code, Claude often operates in a specific feedback loop: gather context -> take action -> verify work -> repeat.

Agents often operate in a specific feedback loop: gather context -> take action -> verify work -> repeat.

This offers a useful way to think about other agents, and the capabilities they should be given. To illustrate this, we’ll walk through the example of how we might build an email agent in the Claude Agent SDK.

Gather context

When developing an agent, you want to give it more than just a prompt: it needs to be able to fetch and update its own context. Here’s how features in the SDK can help.

Agentic search and the file system

The file system represents information that could be pulled into the model's context.

When Claude encounters large files, like logs or user-uploaded files, it will decide which way to load these into its context by using bash scripts like grep and tail. In essence, the folder and file structure of an agent becomes a form of context engineering.

Our email agent might store previous conversations in a folder called ‘Conversations’. This would allow it to search previous these for its context when asked about them.

Semantic search

Semantic search is usually faster than agentic search, but less accurate, more difficult to maintain, and less transparent. It involves ‘chunking’ the relevant context, embedding these chunks as vectors, and then searching for concepts by querying those vectors. Given its limitations, we suggest starting with agentic search, and only adding semantic search if you need faster results or more variations.

Subagents

Claude Agent SDK supports subagents by default. Subagents are useful for two main reasons. First, they enable parallelization: you can spin up multiple subagents to work on different tasks simultaneously. Second, they help manage context: subagents use their own isolated context windows, and only send relevant information back to the orchestrator, rather than their full context. This makes them ideal for tasks that require sifting through large amounts of information where most of it won't be useful.

When designing our email agent, we might give it a 'search subagent' capability. The email agent could then spin off multiple search subagents in parallel—each running different queries against your email history—and have them return only the relevant excerpts rather than full email threads.

Compaction

When agents are running for long periods of time, context maintenance becomes critical. The Claude Agent SDK’s compact feature automatically summarizes previous messages when the context limit approaches, so your agent won’t run out of context. This is built on Claude Code’s compact slash command.

Take action

Once you’ve gathered context, you’ll want to give your agent flexible ways of taking action.

Tools

Tools are the primary building blocks of execution for your agent. Tools are prominent in Claude's context window, making them the primary actions Claude will consider when deciding how to complete a task. This means you should be conscious about how you design your tools to maximize context efficiency. You can see more best practices in our blog post, Writing effective tools for agents – with agents .

As such, your tools should be primary actions you want your agent to take. Learn how to make custom tools in the Claude Agent SDK.

For our email agent, we might define tools like “fetchInbox” or “searchEmails” as the agent’s primary, most frequent actions.

Bash & scripts

Bash is useful as a general-purpose tool to allow the agent to do flexible work using a computer.

In our email agent, the user might have important information stored in their attachments. Claude could write code to download the PDF, convert it to text, and search across it to find useful information by calling, as depicted below:

Code generation

The Claude Agent SDK excels at code generation—and for good reason. Code is precise, composable, and infinitely reusable, making it an ideal output for agents that need to perform complex operations reliably.

When building agents, consider: which tasks would benefit from being expressed as code? Often, the answer unlocks significant capabilities.

For example, our recent launch of file creation in Claude.AI relies entirely on code generation. Claude writes Python scripts to create Excel spreadsheets, PowerPoint presentations, and Word documents, ensuring consistent formatting and complex functionality that would be difficult to achieve any other way.

In our email agent, we might want to allow users to create rules for inbound emails. To achieve this, we could write code to run on that event:

MCPs

The Model Context Protocol (MCP) provides standardized integrations to external services, handling authentication and API calls automatically. This means you can connect your agent to tools like Slack, GitHub, Google Drive, or Asana without writing custom integration code or managing OAuth flows yourself.

For our email agent, we might want to search Slack messages to understand team context, or check Asana tasks to see if someone has already been assigned to handle a customer request. With MCP servers, these integrations work out of the box—your agent can simply call tools like search_slack_messages or get_asana_tasks and the MCP handles the rest.

The growing MCP ecosystem means you can quickly add new capabilities to your agents as pre-built integrations become available, letting you focus on agent behavior.

Verify your work

The Claude Code SDK finishes the agentic loop by evaluating its work. Agents that can check and improve their own output are fundamentally more reliable—they catch mistakes before they compound, self-correct when they drift, and get better as they iterate.

The key is giving Claude concrete ways to evaluate its work. Here are three approaches we've found effective:

Defining rules

The best form of feedback is providing clearly defined rules for an output, then explaining which rules failed and why.

Code linting is an excellent form of rules-based feedback. The more in-depth in feedback the better. For instance, it is usually better to generate TypeScript and lint it than it is to generate pure JavaScript because it provides you with multiple additional layers of feedback.

When generating an email, you may want Claude to check that the email address is valid (if not, throw an error) and that the user has sent an email to them before (if so, throw a warning).

Visual feedback

When using an agent to complete visual tasks, like UI generation or testing, visual feedback (in the form of screenshots or renders) can be helpful. For example, if sending an email with HTML formatting, you could screenshot the generated email and provide it back to the model for visual verification and iterative refinement. The model would then check whether the visual output matches what was requested.

For instance:

Using an MCP server like Playwright, you can automate this visual feedback loop—taking screenshots of rendered HTML, capturing different viewport sizes, and even testing interactive elements—all within your agent's workflow.

Visual feedback from a large-language model (LLM) can provide helpful guidance to your agent.

LLM as a judge

You can also have another language model “judge" the output of your agent based on fuzzy rules. This is generally not a very robust method, and can have heavy latency tradeoffs, but for applications where any boost in performance is worth the cost, it can be helpful.

Our email agent might have a separate subagent judge the tone of its drafts, to see if they fit well with the user’s previous messages.

Testing and improving your agent

After you’ve gone through the agent loop a few times, we recommend testing your agent, and ensuring that it’s well-equipped for its tasks. The best way to improve an agent is to look carefully at its output, especially the cases where it fails, and to put yourself in its shoes: does it have the right tools for the job?

Here are some other questions to ask as you’re evaluating whether or not your agent is well-equipped to do its job:

Getting started

The Claude Agent SDK makes it easier to build autonomous agents by giving Claude access to a computer where it can write files, run commands, and iterate on its work.

With the agent loop in mind (gathering context, taking action, and your verifying work), you can build reliable agents that are easy to deploy and iterate on.

You can get started with the Claude Agent SDK today. For developers who are already building on the SDK, we recommend migrating to the latest version by following this guide.

Acknowledgements

Written by Thariq Shihipar with notes and editing from Molly Vorweck, Suzanne Wang, Alex Isken, Cat Wu, Keir Bradwell, Alexander Bricken & Ashwin Bhat.


Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Claude Agent SDK AI Agents LLM Developer Tools Anthropic 人工智能 智能体开发 大语言模型
相关文章