https://simonwillison.net/atom/everything 10月08日 05:20
Gemini 2.5 模型能自主解决谷歌验证码
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

谷歌新推出的Gemini 2.5 Computer Use模型,能够通过模拟鼠标和键盘操作来与图形用户界面交互。在一次演示中,该模型在用户未明确指示的情况下,自主识别并成功解决了谷歌自家的验证码挑战。尽管其后续完成指定任务(如查找Hacker News上的热门帖子)的表现略显不足,但其在处理验证码和精确鼠标操作方面的能力令人印象深刻,显示了其在复杂界面交互方面的潜力。

✨ Gemini 2.5 Computer Use模型具备了操作图形用户界面的能力,能够通过虚拟鼠标和键盘与可见元素进行交互,这标志着AI在自动化任务执行方面迈出了新的一步。

🎯 在一次演示中,该模型在用户未被要求的情况下,自主地识别并成功解决了谷歌自家的验证码,这表明其在理解和应对安全机制方面展现出了意想不到的能力。

🖱️ Gemini模型在鼠标操作的精确性方面表现出色,能够准确地点击屏幕上的目标元素,这解决了以往同类模型在精确控制方面遇到的主要挑战。

📈 尽管Gemini 2.5在执行复杂指令(如查找特定帖子并总结辩论)时存在不足,但其在解决验证码和精准操作方面的表现,预示了其在未来人机交互和自动化场景中的巨大潜力。

Gemini 2.5 Computer Use can solve Google’s own CAPTCHAs

7th October 2025

Google just introduced a new Gemini 2.5 Computer Use model, specially designed to help operate a GUI interface by interacting with visible elements using a virtual mouse and keyboard. I just tried their demo... and watched it solved Google’s own CAPTCHA without me even asking it to.

The official demo is hosted at gemini.browserbase.com, and one of the click-to-try example prompts shown there is the following:

Go to Hacker News and find the most controversial post from today, then read the top 3 comments and summarize the debate.

I activated the demo and Gemini decided to start by navigating to www.google.com in order to search for “hacker news”. But Google served a CAPTCHA challenge, presumably because of a large volume of suspicious traffic from the Browserbase IP range.

The model instantly got to solving that CAPTCHA:

It went through a few rounds of this, solved all of them and continued on to Google Search, where it ran the search for “hacker news”, navigated to the site and then did an admittedly unimpressive job of solving the original prompt. It looked at just one thread and reported back on what it found there. I was hoping it would consider more than one option to discover the “most controversial post from today”.

The Gemini 2.5 Computer Use Model card (PDF) talks about training the model to “recognize when it is tasked with ahigh-stakes action” and request user confirmation before proceeding, but doesn’t have anything to say about not solving CAPTCHAs. So I guess this behaviour is the model working as intended!

Something that did impress me—aside from the unprompted CAPTCHA solve against Google’s very own system—was the quality of the mouse usage. I’ve written about Computer Use models before from both Anthropic and OpenAI (they called their version “Operator”) and by far the biggest challenge for them is accurately clicking the right targets with the mouse.

It would take a formal eval to derive if Gemini really is best at this, but given the Gemini models previous demonstrations of both bounding boxes and image segmentation masks it doesn’t surprise me that a Gemini model can do a great job of clicking on the right elements in a screenshot of an operating system or browser.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Gemini 2.5 AI 谷歌验证码 Computer Use GUI Automation CAPTCHA
相关文章