MarkTechPost@AI 10月08日 18:40
Gemini 2.5 Computer Use:AI 代理与用户界面交互的新模型
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Google AI 推出了 Gemini 2.5 Computer Use,这是一个专门用于在浏览器中规划和执行 UI 操作的 AI 模型。该模型通过受限的操作 API,能够自动化网页浏览和 UI 测试任务,并在标准基准测试中展现出显著的性能提升。它支持多种预定义的 UI 操作,并可扩展自定义功能。Gemini 2.5 Computer Use 包含一个安全层,可对高风险操作进行人工确认,确保操作的安全性。该模型目前通过 Google AI Studio 和 Vertex AI 提供公共预览,适用于自动化 UI 测试和网页操作。

🤖 **AI 代理的 UI 交互能力增强**:Gemini 2.5 Computer Use 是一种专门的 Gemini 2.5 变体,能够规划和执行实时的浏览器 UI 操作。它通过受限的操作 API,使 AI 代理能够与用户界面进行交互,极大地扩展了 AI 在自动化任务中的应用范围,尤其是在网页自动化和 UI 测试领域。

⚙️ **功能与支持的操作**:该模型支持 13 种预定义的 UI 操作,包括打开浏览器、等待、导航、点击、输入文本、滚动等,并且可以扩展自定义函数以支持非浏览器界面。客户端代码通过执行这些操作,并捕获屏幕截图和 URL,从而实现任务的循环执行和完成。

🔒 **安全机制与约束**:Gemini 2.5 Computer Use 针对浏览器进行了优化,并内置安全监控功能。它可以阻止禁止的操作,或在进行“高风险”操作(如支付、发送消息等)前要求用户确认,确保了操作的安全性,并允许针对移动场景通过自定义操作进行扩展。

Which of your browser workflows would you delegate today if an agent could plan and execute predefined UI actions? Google AI introduces Gemini 2.5 Computer Use, a specialized variant of Gemini 2.5 that plans and executes real UI actions in a live browser via a constrained action API. It’s available in public preview through Google AI Studio and Vertex AI. The model targets web automation and UI testing, with documented, human-judged gains on standard web/mobile control benchmarks and a safety layer that can require human confirmation for risky steps.

What the model actually ships?

Developers call a new computer_use tool that returns function calls like click_at, type_text_at, or drag_and_drop. Client code executes the action (e.g., Playwright/Browserbase), captures a fresh screenshot/URL, and loops until the task ends or a safety rule blocks it. The supported action space is 13 predefined UI actionsopen_web_browser, wait_5_seconds, go_back, go_forward, search, navigate, click_at, hover_at, type_text_at, key_combination, scroll_document, scroll_at, drag_and_drop—and can be extended with custom functions (e.g., open_app, long_press_at, go_home) for non-browser surfaces.

https://blog.google/technology/google-deepmind/gemini-computer-use-model/

What is the scope and constraints?

The model is optimized for web browsers. Google states it is not yet optimized for desktop OS-level control; mobile scenarios work by swapping in custom actions under the same loop. A built-in safety monitor can block prohibited actions or require user confirmation before “high-stakes” operations (payments, sending messages, accessing sensitive records).

Measured performance

https://blog.google/technology/google-deepmind/gemini-computer-use-model/

Early production signals

Editorial Comments

Gemini 2.5 Computer Use is in public preview via Google AI Studio and Vertex AI; it exposes a constrained API with 13 documented UI actions and requires a client-side executor. Google’s materials and the model card report state-of-the-art results on web/mobile control benchmarks, and Browserbase’s matched harness shows ~65.7% pass@1 on Online-Mind2Web with leading latency under identical constraints. The scope is browser-first with per-step safety/confirmation. These data points justify measured evaluation in UI testing and web ops.


Check out the Technical details. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post Google AI Introduces Gemini 2.5 ‘Computer Use’ (Preview): A Browser-Control Model to Power AI Agents to Interact with User Interfaces appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Gemini 2.5 Computer Use AI Agents UI Automation Web Automation UI Testing Google AI Browser Control Vertex AI Google AI Studio
相关文章