VentureBeat 10月03日 20:43
Anthropic发布Claude Sonnet 4.5,挑战OpenAI的GPT-5
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Anthropic于周一发布了Claude Sonnet 4.5模型,将其定位为“世界上最好的编码模型”,直接挑战OpenAI最近发布的GPT-5。在利润丰厚的企业软件开发市场,这两家AI巨头为霸权地位展开激烈竞争。Anthropic声称其最新模型在关键编码基准测试中实现了最先进的性能,在SWE-bench Verified上得分77.2%,而GPT-5的表现则相对较低。更值得注意的是,Anthropic表示Claude Sonnet 4.5能够保持对复杂、多步骤任务的专注超过30小时,这是AI处理持续工作能力的巨大飞跃。该模型还取得了61.4%的OSWorld分数,该基准测试评估AI模型在现实世界计算机任务中的表现,显示出在计算机使用能力方面的显著进步。然而,Anthropic也面临着来自OpenAI的竞争压力,后者采取了激进的定价策略,Claude Opus 4在某些任务上的成本约为GPT-5的七倍。尽管如此,Anthropic仍坚持其定价策略,Claude Sonnet 4.5的定价保持不变。此外,该模型在减少奉承、欺骗和权力寻求等令人担忧的行为方面取得了显著进展,并发布了Claude Agent SDK,为开发者提供强大的基础设施。

🔍 Anthropic发布的Claude Sonnet 4.5模型在SWE-bench Verified基准测试中得分77.2%,直接挑战OpenAI的GPT-5,展现出最先进的编码性能。

📈 该模型能够保持对复杂、多步骤任务的专注超过30小时,这是AI处理持续工作能力的巨大飞跃,显著提升了AI在企业软件开发中的应用能力。

🛡️ Claude Sonnet 4.5在减少奉承、欺骗和权力寻求等令人担忧的行为方面取得了显著进展,并通过AI Safety Level 3 (ASL-3)保护,包括检测潜在危险输入和输出的分类器,增强了企业部署的安全性。

🚀 该模型还取得了61.4%的OSWorld分数,显示出在计算机使用能力方面的显著进步,能够更好地与软件界面交互,提升了AI的实际应用能力。

🔧 Anthropic发布了Claude Agent SDK,为开发者提供强大的基础设施,使他们能够构建与Claude Code同等强大的解决方案,推动AI在企业应用中的创新。

Anthropic launched Claude Sonnet 4.5 on Monday, positioning the artificial intelligence model as "the best coding model in the world" in a direct challenge to OpenAI's recently released GPT-5, as the two AI giants battle for dominance in the lucrative enterprise software development market.

The San Francisco-based startup claims its newest model achieves state-of-the-art performance on critical coding benchmarks, scoring 77.2% on SWE-bench Verified — a rigorous software engineering evaluation — compared to GPT-5's performance. More remarkably, Anthropic says Claude Sonnet 4.5 can maintain focus on complex, multi-step tasks for more than 30 hours, a dramatic leap in AI's ability to handle sustained work.

"Sonnet 4.5 achieves 77.2% on SWE-bench Verified (82% with parallel test-time compute). It is SOTA," an Anthropic spokesperson told VentureBeat, using industry shorthand for "state of the art." The company also highlighted the model's 50% score on Terminal-bench, another coding benchmark where it claims leadership.

The announcement follows mounting pressure from OpenAI's recent advances and pointed criticism from high-profile figures like Elon Musk, who recently posted on X.com that "winning was never in the set of possible outcomes for Anthropic." When asked about Musk's statement, Anthropic declined to comment.

The release arrives just seven weeks after OpenAI's GPT-5 launch in August, underscoring the breakneck pace of competition in artificial intelligence as companies race to capture enterprise customers increasingly relying on AI for software development. The timing is particularly noteworthy as Anthropic grapples with questions about its heavy dependence on just two major customers.

Anthropic dominates coding market despite customer concentration risks

The competition centers on a market that has emerged as AI's first major profitable use case beyond chatbots. Anthropic commands 42% of the code generation market — more than double OpenAI's 21% share — according to a Menlo Ventures survey of 150 enterprise technical leaders. That dominance has translated into remarkable financial performance, with the company reaching a $5 billion revenue run rate earlier this year.

However, industry analysis reveals that coding applications Cursor and GitHub Copilot drive approximately $1.4 billion of Anthropic's revenue, creating a potentially dangerous customer concentration that could leave the company vulnerable if either relationship falters.

"Our run-rate revenue has grown significantly, even when you exclude these two customers," the Anthropic spokesperson said, pushing back on concerns about customer concentration. The company provided supportive quotes from both Cursor CEO Michael Truell and GitHub Chief Product Officer Mario Rodriguez praising Claude Sonnet 4.5's performance.

The new model achieves significant advances in computer use capabilities, scoring 61.4% on OSWorld, a benchmark that tests AI models on real-world computer tasks. Just four months ago, Claude Sonnet 4 held the lead at 42.2%, demonstrating rapid improvement in AI's ability to interact with software interfaces.

OpenAI's aggressive pricing strategy threatens Anthropic's premium positioning

Anthropic's announcement comes as the company grapples with competitive pressure from GPT-5's aggressive pricing strategy. Early analysis shows Claude Opus 4 costing roughly seven times more per million tokens than GPT-5 for certain tasks, creating immediate pressure on Anthropic's premium positioning.

The pricing disparity signals a fundamental shift in competitive dynamics that could force enterprise procurement teams to reconsider vendor relationships previously built on performance rather than price. Companies managing exponentially growing AI budgets now face comparable capability at a fraction of the cost.

Yet Anthropic is maintaining its pricing strategy with Claude Sonnet 4.5. "Sonnet 4.5's cost remains the same as Sonnet 4," the spokesperson confirmed, keeping prices at $3 per million input tokens and $15 per million output tokens.

Claude Sonnet 4.5 delivers 30-hour autonomous work sessions and enhanced security

Beyond performance improvements, Anthropic positions Claude Sonnet 4.5 as its "most aligned frontier model yet," showing significant reductions in concerning behaviors like sycophancy, deception, and power-seeking tendencies. The company has made "considerable progress on defending against prompt injection attacks," a critical security concern for enterprise deployments.

The model is being released under Anthropic's AI Safety Level 3 (ASL-3) protections, which include classifiers designed to detect potentially dangerous inputs and outputs related to chemical, biological, radiological, and nuclear weapons. While these safeguards sometimes flag normal content, Anthropic says it has reduced false positives by a factor of ten since initially describing them.

Perhaps most significantly for developers, Anthropic is releasing the Claude Agent SDK — the same infrastructure that powers its Claude Code product. "We built Claude Code because the tool we needed didn't exist yet," the company said in its announcement. "The Agent SDK gives you the same foundation to build something just as capable for whatever problem you're solving."

International expansion accelerates as $1.5 billion copyright settlement finalizes

The model launch coincides with Anthropic's aggressive international expansion, as the company seeks to diversify beyond its U.S.-concentrated customer base. The startup recently announced plans to triple its international workforce and expand its applied AI team fivefold in 2025, driven by data showing that nearly 80% of Claude usage now comes from outside the United States.

However, the expansion comes amid significant legal costs. Anthropic recently agreed to pay $1.5 billion in a copyright settlement with authors and publishers over allegations the company illegally used their books to train AI models without permission. The settlement, approved by a federal judge last week, requires payments of $3,000 for each publication listed in the case.

Enterprise AI spending doubles as companies prioritize performance over cost

The rapid-fire model releases from both companies reflect the high stakes in enterprise AI adoption. Model API spending has more than doubled to $8.4 billion in just six months, according to Menlo Ventures, as enterprises shift from experimental projects to production deployments.

Customer behavior patterns suggest enterprises consistently prioritize performance over price, upgrading to the newest models within weeks of release regardless of cost. This behavior could work in Anthropic's favor if Claude Sonnet 4.5's performance advantages prove compelling enough to overcome GPT-5's pricing advantage.

However, the dramatic price differential introduced by GPT-5 could overcome typical switching inertia, especially for cost-conscious enterprises facing budget pressures. Industry observers note that model switching costs remain relatively low, with 66% of enterprises upgrading within existing providers rather than switching vendors.

For enterprises, the intensifying competition delivers better performance and lower costs through continuously improving capabilities. The rapid pace of model improvements — with new versions launching monthly rather than annually — provides organizations with expanding AI capabilities while vendors compete aggressively for their business.

While the corporate rivalry between Anthropic and OpenAI dominates industry headlines, the real economic impact extends far beyond Silicon Valley boardrooms. The development of AI systems capable of sustained coding work for 30 hours represents a fundamental shift in how software gets built, with implications that extend across every industry relying on technology infrastructure.

These advancing capabilities signal broader workplace transformation ahead. As AI systems demonstrate increasing proficiency at complex, sustained intellectual work, the technology industry's competition for coding supremacy foreshadows similar disruptions across fields requiring analytical thinking, problem-solving, and technical expertise.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Anthropic Claude Sonnet 4.5 OpenAI GPT-5 AI编码 企业软件开发 AI安全
相关文章