Claude Sonnet 4.5，全球最佳编码模型

Claude Sonnet 4.5 is probably the “best coding model in the world” (at least for now)

29th September 2025

Anthropic released Claude Sonnet 4.5 today, with a very bold set of claims:

Claude Sonnet 4.5 is the best coding model in the world. It’s the strongest model for building complex agents. It’s the best model at using computers. And it shows substantial gains in reasoning and math.

Anthropic gave me access to a preview version of a “new model” over the weekend which turned out to be Sonnet 4.5. My initial impressions were that it felt like a better model for code than GPT-5-Codex, which has been my preferred coding model since it launched a few weeks ago. This space moves so fast—Gemini 3 is rumored to land soon so who knows how long Sonnet 4.5 will continue to hold the “best coding model” crown.

The pricing is the same as the previous Sonnet: $3/million input tokens and $15/million output tokens. This remains significantly cheaper than Claude Opus—$15/$75—but still quite a bit more than GPT-5 and GPT-5-Codex, both at $1.25/$10.

It really shines with Claude.ai Code Interpreter

The claude.ai web interface (not yet the Claude iPhone native app) recently added the ability for Claude to write and then directly execute code in a sandboxed server environment, using Python and Node.js. I wrote about that in detail three weeks ago.

Anthropic’s implementation of this code interpreter pattern is more powerful than ChatGPT’s equivalent because it can directly clone code from GitHub and install software packages from NPM and PyPI.

Sonnet 4.5 absolutely shines at using this tool!

Here’s my most ambitious experiment with it. I started like this:

Checkout https://github.com/simonw/llm and run the tests with
pip install -e '.[test]'
pytest

It did exactly as I asked, checking out my LLM repo, installing it along with all of its dependencies, and running the tests.

✅ All 466 tests passed in 167.69 seconds (about 2 minutes 47 seconds)

Then I posed it a wildly ambitious challenge:

The tool stores prompts and responses in a SQLite database. It currently uses a single conversation row and a linear collection of responses within that conversation. Experiment with the idea of adding a parent_response_id column to that responses table and using it to model the responses in a conversation as a tree instead. Use new pytest tests which you execute individually (to save time on the whole test suite) to exercise your design. Write a tree_notes.md file which you first populate with your plan and then use for notes as you go along.

As you can see in the resulting conversation transcript, it did a really impressive job of this! It churned away across dozens of different tool calls, eventually outputting the following summary of its activity:

I’ve successfully experimented with adding tree-structured conversations to the LLM CLI tool. Here’s what was accomplished:
✅ Implementation Complete
1. Database Schema Enhancement
Added parent_response_id column to the responses table via migration m022_parent_response_idEnables tree-structured (branching) conversations instead of just linear sequencesFully backward compatible—existing data unaffected
2. Comprehensive Utility Module (tree_utils.py)
12 helper functions for tree operations:
Navigation (children, parent, siblings, paths)Analysis (depth, descendants, size)Queries (roots, leaves, statistics)Visualization (ASCII tree display)
3. Complete Test Suite (test_tree_conversations.py)
16 tests covering all tree operationsTests linear chains, branching, multiple roots, forestsIncludes realistic debugging scenarioAll tests pass ✅
4. Integration Verification
Updated existing migration tests to expect new columnAll 6 migration tests passTotal: 22/22 tests passing ✅
📦 Deliverables (7 files in /mnt/user-data/outputs/)
README.md—Quick start guideIMPLEMENTATION_SUMMARY.md—Complete technical overviewMIGRATION_CODE.md—Database migration detailsTEST_RESULTS.md—Full test coverage reporttree_notes.md—Design decisions and development notestree_utils.py—Utility functions moduletest_tree_conversations.py—Test suite
🎯 Key Features
Multiple branches from any conversation pointMultiple roots per conversation (forest structure)Rich analytics (depth, branching factor, tree size)ASCII tree visualizationCycle detection for safety
🚀 Ready for Integration
The implementation is production-ready with comprehensive testing. Next steps would be:
Integrate tree_utils.py into LLM packageAdd CLI commands (llm branch, llm tree)Update Response.log() to accept parent_response_id
All code is tested, documented, and ready to use!

Did I mention I kicked off this experiment by entering prompts on my phone?

I told it to Give me a zip file of everything you have done so far—you can explore the contents of the file it made me in this Gist.

Update: I shared a longer version of this experiment where I had the model experiment with a complex message hashing mechanism as well. I’ve pushed the code to a branch, which is best explained by the message_matching_notes.md document written by Claude.

Pelicans

I released llm-anthropic 0.19 adding support for the new model. Then I ran my classic pelican benchmark, first with thinking turned on:

llm -m claude-sonnet-4.5 -o thinking 1 -o thinking_budget 4000 \  'Generate an SVG of a pelican riding a bicycle'

And then without:

llm -m claude-sonnet-4.5 \  'Generate an SVG of a pelican riding a bicycle'

These are pretty good—they are recognizably pelicans!—though not quite as good as GPT-5-Codex which is better at drawing bicycles.

I also tried having it describe this image:

llm -m claude-sonnet-4.5 'describe this image' -a https://static.simonwillison.net/static/2024/pelicans.jpg

This image shows a large gathering of pelicans crowded together along what appears to be a waterfront or shoreline. The birds are densely packed, creating a chaotic scene with dozens of brown pelicans visible. Their distinctive long bills and pouches are clearly visible among the mass of feathered bodies. In the background, there’s calm water and a dark tree line or forested area. The lighting suggests this was taken during early morning or late afternoon, giving the scene a golden, warm tone. The pelicans appear to be resting or congregating together, which is typical behavior for these colonial seabirds.

Anthropic are rolling this out everywhere

The release of this model has been very well coordinated. My embargo on talking about it was due to lift at 10am Pacific today, and I got an email from them linking to their blog post at 10am on the dot. It’s also already live on OpenRouter and in Cursor and GitHub Copilot and no doubt a whole bunch of other places as well.

Anthropic also shipped a new Claude Code VS Code extension today, plus a big upgrade to the Claude Code terminal app. Plus they rebranded their confusingly named Claude Code SDK to the Claude Agent SDK instead, emphasizing that it’s a tool for building agents beyond just customizing the existing Claude Code product. That’s available for both TypeScript and Python.

Claude Sonnet 4.5 is probably the “best coding model in the world” (at least for now)

It really shines with Claude.ai Code Interpreter

✅ Implementation Complete

📦 Deliverables (7 files in `/mnt/user-data/outputs/`)

🎯 Key Features

🚀 Ready for Integration

Pelicans

Anthropic are rolling this out everywhere

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签

Claude Sonnet 4.5 is probably the “best coding model in the world” (at least for now)

It really shines with Claude.ai Code Interpreter

✅ Implementation Complete

📦 Deliverables (7 files in /mnt/user-data/outputs/)

🎯 Key Features

🚀 Ready for Integration

Pelicans

Anthropic are rolling this out everywhere

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签

📦 Deliverables (7 files in `/mnt/user-data/outputs/`)