Fortune | FORTUNE 09月30日 01:09
AI助手自主运行能力显著提升
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Anthropic公司发布的新一代AI模型Claude Sonnet 4.5在自主运行能力上取得了重大突破,能够独立完成软件应用开发任务长达30小时,远超此前模型的7小时续航。该模型在各项基准测试中表现优异,特别是在软件开发领域,其性能已达到行业领先水平,能够更有效地遵循指令、识别代码问题并生成生产级代码。此外,在金融服务行业,Claude Sonnet 4.5在研究、建模和预测方面也展现出超越前代模型的实力。这些进步表明Anthropic正致力于将AI模型定位为企业级工作助手,尤其是在编码支持和自动化任务方面,正逐步领先于竞争对手。

🚀 **自主运行时间大幅延长**: Claude Sonnet 4.5 实现了长达30小时的自主运行能力,远超前代模型的7小时,能够在几乎无人监管的情况下独立完成复杂的软件开发任务,这标志着AI在独立工作能力上的重大进步。

💻 **软件开发能力达到新高度**: 该模型在SWE-Bench Verified等关键基准测试中表现卓越,不仅在遵循指令、识别代码缺陷和生成生产级代码方面优于以往,还在编码效率和质量上达到了行业领先水平,使其成为强大的编程助手。

📈 **在金融服务领域表现突出**: Claude Sonnet 4.5 在金融行业的应用中也展现出显著优势,能够更有效地进行行业研究、构建复杂的金融模型以及进行精准的未来预测,为金融专业人士提供了强大的支持工具。

💼 **面向企业级应用和任务自动化**: Anthropic正积极将Claude模型定位为企业工作场景中的高效助手,尤其侧重于编码支持和自动化任务。数据显示,绝大多数API用户请求模型执行任务而非提供建议,显示出企业正利用AI来分担实际工作负载,而非仅作辅助决策。

💡 **推动工作模式变革**: 随着Claude这类AI模型在复杂、耗时领域(如软件工程)的自主工作能力增强,其对企业和员工的影响将是深远的。自主代理的出现有望减少对持续人工监督的需求,降低重复性工作流程的成本,加速运营效率,并可能重塑未来的劳动力结构。

The company said that the model was able to run autonomously for 30 hours, maintaining sustained focus with minimal oversight while building an entire software application. It’s a significant improvement over the company’s previous Opus 4 model, released four months ago, which could operate autonomously for only seven hours.

Anthropic said Claude Sonnet 4.5 also outperformed Opus on key benchmarks and was more effective in meeting customers’ practical business needs. The company said the model was even better at coding than previous frontier models, and state-of-the-art on SWE-Bench Verified, a key benchmark that tests how models perform at software development tasks. Anthropic said that Claude Sonnet 4.5 was better than its predecessors at following instructions, identifying code improvements, and generating more production-ready code. When tested on tasks from the financial services industry, the company said the new model outperformed earlier Claude models in tasks such as researching, building financial models, and forecasting.

Anthropic appears to be pushing further ahead of its competitors in coding assistance and autonomous task completion, positioning its models toward corporate and workplace use. The company’s previous Claude 4.1 Opus model already bested competitors on OpenAI’s new benchmark of professional task completion, GDPval, which tested how models performed compared to human professionals across a range of industries and jobs.

Last week, OpenAI said its GPT-5 model and Anthropic’s Claude Opus 4.1 were “already approaching the quality of work produced by industry experts.”

Dueling usage studies released earlier this month also suggested that Anthropic’s Claude models were emerging as more professionally-oriented AI models, especially in comparison to OpenAI’s ChatGPT, which is increasingly being used as a consumer product.

According to the study, most Claude users were turning to the models for workplace or productivity tasks, with mathematical tasks and coding cited as the dominant activities globally for Claude.ai, and making up 36% of all use cases.

Business use of Claude leaned heavily toward task automation. According to the study, approximately 77% of prompts that the model receives through its API—the application programming interface that is primarily used by enterprise customers—involve users requesting the system to perform tasks on their behalf, rather than just providing advice or suggestions. These business-focused interactions are also concentrated in coding, which accounts for 44% of API use. A further 5% of API usage was dedicated to developing or evaluating AI systems.

The tasks that business users automate also tend to be the most expensive ones to run. The findings indicate a shift in how businesses approach these tools. Rather than using them mainly for decision support or research, many teams are relying on them to take work off their plates entirely.

If models like Claude are able to become more capable of autonomous work, especially in complex, time-intensive domains like software engineering, the implications for businesses and employees could be significant. Autonomous agents can reduce the need for constant human oversight and lower costs on repetitive workflows, speeding up a company’s operations and potentially reducing the need for headcount.

Fortune Global Forum

returns Oct. 26–27, 2025 in Riyadh. CEOs and global leaders will gather for a dynamic, invitation-only event shaping the future of business.

Apply for an invitation.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Claude Sonnet 4.5 AI 自主运行 软件开发 任务自动化 企业应用 Anthropic AI助手 Autonomous Operation Software Development Task Automation Enterprise AI
相关文章