https://simonwillison.net/atom/everything 09月30日
Claude Sonnet 4.5,全球最佳编码模型
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Anthropic今日发布了Claude Sonnet 4.5,宣称它是全球最佳编码模型,擅长构建复杂代理和电脑操作,并在推理和数学上取得显著进步。该模型与Claude.ai Code Interpreter结合表现出色,可直接克隆GitHub代码和安装软件包。作者通过实验展示了其能力,成功为LLM CLI工具添加了树形对话结构,包括数据库模式增强、实用模块和完整测试套件。定价与Sonnet 4.5相同,每百万输入令牌3美元,输出令牌15美元,仍比Claude Opus便宜,但比GPT-5和GPT-5-Codex贵。Sonnet 4.5在绘制鸟类图像方面表现良好,但不及GPT-5-Codex。Anthropic已将此模型推广至多个平台,包括OpenRouter、Cursor和GitHub Copilot,并发布了新的VS Code扩展和Claude Agent SDK。

🔍 Claude Sonnet 4.5是Anthropic发布的最新编码模型,宣称在构建复杂代理和电脑操作方面表现最佳,并在推理和数学能力上取得显著提升,被认为是目前全球最强的编码模型。

💻 该模型与Claude.ai Code Interpreter结合使用时表现出色,能够直接从GitHub克隆代码和安装软件包,极大地提升了开发效率。

🧪 作者通过实验展示了Sonnet 4.5的能力,成功为LLM CLI工具添加了树形对话结构,包括数据库模式增强、实用模块和完整测试套件,证明了其在复杂任务中的强大能力。

💰 Claude Sonnet 4.5的定价与之前的Sonnet相同,每百万输入令牌3美元,输出令牌15美元,虽然比Claude Opus便宜,但仍然比GPT-5和GPT-5-Codex贵。

🎨 Sonnet 4.5在绘制鸟类图像方面表现良好,但不及GPT-5-Codex,显示出其在图像生成方面的优势仍有提升空间。

🚀 Anthropic已将Claude Sonnet 4.5推广至多个平台,包括OpenRouter、Cursor和GitHub Copilot,并发布了新的VS Code扩展和Claude Agent SDK,显示出其快速迭代和广泛应用的决心。

Claude Sonnet 4.5 is probably the “best coding model in the world” (at least for now)

29th September 2025

Anthropic released Claude Sonnet 4.5 today, with a very bold set of claims:

Claude Sonnet 4.5 is the best coding model in the world. It’s the strongest model for building complex agents. It’s the best model at using computers. And it shows substantial gains in reasoning and math.

Anthropic gave me access to a preview version of a “new model” over the weekend which turned out to be Sonnet 4.5. My initial impressions were that it felt like a better model for code than GPT-5-Codex, which has been my preferred coding model since it launched a few weeks ago. This space moves so fast—Gemini 3 is rumored to land soon so who knows how long Sonnet 4.5 will continue to hold the “best coding model” crown.

The pricing is the same as the previous Sonnet: $3/million input tokens and $15/million output tokens. This remains significantly cheaper than Claude Opus—$15/$75—but still quite a bit more than GPT-5 and GPT-5-Codex, both at $1.25/$10.

It really shines with Claude.ai Code Interpreter

The claude.ai web interface (not yet the Claude iPhone native app) recently added the ability for Claude to write and then directly execute code in a sandboxed server environment, using Python and Node.js. I wrote about that in detail three weeks ago.

Anthropic’s implementation of this code interpreter pattern is more powerful than ChatGPT’s equivalent because it can directly clone code from GitHub and install software packages from NPM and PyPI.

Sonnet 4.5 absolutely shines at using this tool!

Here’s my most ambitious experiment with it. I started like this:

Checkout https://github.com/simonw/llm and run the tests with

pip install -e '.[test]'
pytest

It did exactly as I asked, checking out my LLM repo, installing it along with all of its dependencies, and running the tests.

✅ All 466 tests passed in 167.69 seconds (about 2 minutes 47 seconds)

Then I posed it a wildly ambitious challenge:

The tool stores prompts and responses in a SQLite database. It currently uses a single conversation row and a linear collection of responses within that conversation. Experiment with the idea of adding a parent_response_id column to that responses table and using it to model the responses in a conversation as a tree instead. Use new pytest tests which you execute individually (to save time on the whole test suite) to exercise your design. Write a tree_notes.md file which you first populate with your plan and then use for notes as you go along.

As you can see in the resulting conversation transcript, it did a really impressive job of this! It churned away across dozens of different tool calls, eventually outputting the following summary of its activity:

I’ve successfully experimented with adding tree-structured conversations to the LLM CLI tool. Here’s what was accomplished:

Implementation Complete

1. Database Schema Enhancement

    Added parent_response_id column to the responses table via migration m022_parent_response_idEnables tree-structured (branching) conversations instead of just linear sequencesFully backward compatible—existing data unaffected

2. Comprehensive Utility Module (tree_utils.py)

    12 helper functions for tree operations:
      Navigation (children, parent, siblings, paths)Analysis (depth, descendants, size)Queries (roots, leaves, statistics)Visualization (ASCII tree display)

3. Complete Test Suite (test_tree_conversations.py)

    16 tests covering all tree operationsTests linear chains, branching, multiple roots, forestsIncludes realistic debugging scenarioAll tests pass ✅

4. Integration Verification

    Updated existing migration tests to expect new columnAll 6 migration tests passTotal: 22/22 tests passing ✅

📦 Deliverables (7 files in /mnt/user-data/outputs/)

    README.md—Quick start guideIMPLEMENTATION_SUMMARY.md—Complete technical overviewMIGRATION_CODE.md—Database migration detailsTEST_RESULTS.md—Full test coverage reporttree_notes.md—Design decisions and development notestree_utils.py—Utility functions moduletest_tree_conversations.py—Test suite

🎯 Key Features

    Multiple branches from any conversation pointMultiple roots per conversation (forest structure)Rich analytics (depth, branching factor, tree size)ASCII tree visualizationCycle detection for safety

🚀 Ready for Integration

The implementation is production-ready with comprehensive testing. Next steps would be:

    Integrate tree_utils.py into LLM packageAdd CLI commands (llm branch, llm tree)Update Response.log() to accept parent_response_id

All code is tested, documented, and ready to use!

Did I mention I kicked off this experiment by entering prompts on my phone?

I told it to Give me a zip file of everything you have done so far—you can explore the contents of the file it made me in this Gist.

Update: I shared a longer version of this experiment where I had the model experiment with a complex message hashing mechanism as well. I’ve pushed the code to a branch, which is best explained by the message_matching_notes.md document written by Claude.

Pelicans

I released llm-anthropic 0.19 adding support for the new model. Then I ran my classic pelican benchmark, first with thinking turned on:

llm -m claude-sonnet-4.5 -o thinking 1 -o thinking_budget 4000 \  'Generate an SVG of a pelican riding a bicycle'

And then without:

llm -m claude-sonnet-4.5 \  'Generate an SVG of a pelican riding a bicycle'

These are pretty good—they are recognizably pelicans!—though not quite as good as GPT-5-Codex which is better at drawing bicycles.

I also tried having it describe this image:

llm -m claude-sonnet-4.5 'describe this image' -a https://static.simonwillison.net/static/2024/pelicans.jpg

This image shows a large gathering of pelicans crowded together along what appears to be a waterfront or shoreline. The birds are densely packed, creating a chaotic scene with dozens of brown pelicans visible. Their distinctive long bills and pouches are clearly visible among the mass of feathered bodies. In the background, there’s calm water and a dark tree line or forested area. The lighting suggests this was taken during early morning or late afternoon, giving the scene a golden, warm tone. The pelicans appear to be resting or congregating together, which is typical behavior for these colonial seabirds.

Anthropic are rolling this out everywhere

The release of this model has been very well coordinated. My embargo on talking about it was due to lift at 10am Pacific today, and I got an email from them linking to their blog post at 10am on the dot. It’s also already live on OpenRouter and in Cursor and GitHub Copilot and no doubt a whole bunch of other places as well.

Anthropic also shipped a new Claude Code VS Code extension today, plus a big upgrade to the Claude Code terminal app. Plus they rebranded their confusingly named Claude Code SDK to the Claude Agent SDK instead, emphasizing that it’s a tool for building agents beyond just customizing the existing Claude Code product. That’s available for both TypeScript and Python.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Claude Sonnet 4.5 编码模型 人工智能 Anthropic Claude.ai Code Interpreter
相关文章