Gemini 2.5 Computer Use：AI 代理与用户界面交互的新模型

Which of your browser workflows would you delegate today if an agent could plan and execute predefined UI actions? Google AI introduces Gemini 2.5 Computer Use, a specialized variant of Gemini 2.5 that plans and executes real UI actions in a live browser via a constrained action API. It’s available in public preview through Google AI Studio and Vertex AI. The model targets web automation and UI testing, with documented, human-judged gains on standard web/mobile control benchmarks and a safety layer that can require human confirmation for risky steps.

What the model actually ships?

Developers call a new computer_use tool that returns function calls like click_at, type_text_at, or drag_and_drop. Client code executes the action (e.g., Playwright/Browserbase), captures a fresh screenshot/URL, and loops until the task ends or a safety rule blocks it. The supported action space is 13 predefined UI actions—open_web_browser, wait_5_seconds, go_back, go_forward, search, navigate, click_at, hover_at, type_text_at, key_combination, scroll_document, scroll_at, drag_and_drop—and can be extended with custom functions (e.g., open_app, long_press_at, go_home) for non-browser surfaces.

https://blog.google/technology/google-deepmind/gemini-computer-use-model/

What is the scope and constraints?

The model is optimized for web browsers. Google states it is not yet optimized for desktop OS-level control; mobile scenarios work by swapping in custom actions under the same loop. A built-in safety monitor can block prohibited actions or require user confirmation before “high-stakes” operations (payments, sending messages, accessing sensitive records).

Measured performance

Online-Mind2Web (official):

69.0% pass@1

Browserbase matched harness:

Leads

accuracy and latency

Online-Mind2Web

WebVoyager

65.7% (OM2W)

79.9% (WebVoyager)

Latency/quality trade-off (Google figure):

~70%+ accuracy at ~225 s

AndroidWorld (mobile generalization):

69.7%

custom mobile actions

https://blog.google/technology/google-deepmind/gemini-computer-use-model/

Early production signals

Automated UI test repair:

rehabilitates >60%

Operational speed:

Poke.com

often ~50% faster

Editorial Comments

Gemini 2.5 Computer Use is in public preview via Google AI Studio and Vertex AI; it exposes a constrained API with 13 documented UI actions and requires a client-side executor. Google’s materials and the model card report state-of-the-art results on web/mobile control benchmarks, and Browserbase’s matched harness shows ~65.7% pass@1 on Online-Mind2Web with leading latency under identical constraints. The scope is browser-first with per-step safety/confirmation. These data points justify measured evaluation in UI testing and web ops.

Check out the Technical details. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post Google AI Introduces Gemini 2.5 ‘Computer Use’ (Preview): A Browser-Control Model to Power AI Agents to Interact with User Interfaces appeared first on MarkTechPost.

What the model actually ships?

What is the scope and constraints?

Measured performance

Early production signals

Editorial Comments

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签