cs.AI updates on arXiv.org 10月21日 12:18
GUIrilla:突破桌面自动化数据收集难题
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍GUIrilla,一个自动化可扩展框架,通过原生可访问性API系统性地探索应用,解决GUI自动化中的关键数据收集挑战。该框架支持macOS,并构建了大规模数据集GUIrilla-Task,通过实证结果证明其性能优越。

arXiv:2510.16051v1 Announce Type: cross Abstract: Autonomous agents capable of operating complex graphical user interfaces (GUIs) have the potential to transform desktop automation. While recent advances in large language models (LLMs) have significantly improved UI understanding, navigating full-window, multi-application desktop environments remains a major challenge. Data availability is limited by costly manual annotation, closed-source datasets and surface-level synthetic pipelines. We introduce GUIrilla, an automated scalable framework that systematically explores applications via native accessibility APIs to address the critical data collection challenge in GUI automation. Our framework focuses on macOS - an ecosystem with limited representation in current UI datasets - though many of its components are designed for broader cross-platform applicability. GUIrilla organizes discovered interface elements and crawler actions into hierarchical GUI graphs and employs specialized interaction handlers to achieve comprehensive application coverage. Using the application graphs from GUIrilla crawler, we construct and release GUIrilla-Task, a large-scale dataset of 27,171 functionally grounded tasks across 1,108 macOS applications, each annotated with full-desktop and window-level screenshots, accessibility metadata, and semantic action traces. Empirical results show that tuning LLM-based agents on GUIrilla-Task significantly improves performance on downstream UI tasks, outperforming synthetic baselines on the ScreenSpot Pro benchmark while using 97% less data. We also release macapptree, an open-source library for reproducible collection of structured accessibility metadata, along with the full GUIrilla-Task dataset, the manually verified GUIrilla-Gold benchmark, and the framework code to support open research in desktop autonomy.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

桌面自动化 数据收集 GUIrilla macOS LLM
相关文章