MarkTechPost@AI 06月25日
Google DeepMind Releases Gemini Robotics On-Device: Local AI Model for Real-Time Robotic Dexterity
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Google DeepMind 推出了 Gemini Robotics On-Device,这是一款紧凑的本地版本视觉语言动作 (VLA) 模型,将先进的机器人智能直接带到设备上。此举消除了对持续云连接的需求,同时保持了与 Gemini 模型系列相关联的灵活性、通用性和高精度,标志着具身智能领域向前迈出的关键一步。该模型能够在机器人内置 GPU 上运行,支持对延迟敏感且带宽受限的场景,例如家庭、医院和制造车间,并具备理解人类指令、感知多模态输入和生成实时运动动作的能力。

🤖️ **本地执行能力:** Gemini Robotics On-Device 模型完全在机器人的内置 GPU 上运行,无需互联网连接即可实现闭环控制,这使其能够在对延迟敏感的环境中做出快速响应,例如家庭和制造车间。

👐 **双手灵巧性:** 该模型能够执行复杂的、协调的双臂操作任务,这得益于其在 ALOHA 数据集上的预训练和后续微调。这意味着机器人可以执行更复杂的任务,如折叠衣服或组装组件。

⚙️ **多平台兼容性:** 尽管该模型接受过特定机器人的训练,但它能够推广到不同平台,包括人形机器人和工业双臂机械手。这提高了该模型在不同应用场景中的灵活性。

💡 **少样本适应:** 该模型支持通过少量演示快速学习新任务,从而显著减少开发时间。这使得机器人能够更快地适应新环境和新任务,提高其在现实世界中的实用性。

Google DeepMind has unveiled Gemini Robotics On-Device, a compact, local version of its powerful vision-language-action (VLA) model, bringing advanced robotic intelligence directly onto devices. This marks a key step forward in the field of embodied AI by eliminating the need for continuous cloud connectivity while maintaining the flexibility, generality, and high precision associated with the Gemini model family.

Local AI for Real-World Robotic Dexterity

Traditionally, high-capacity VLA models have relied on cloud-based processing due to computational and memory constraints. With Gemini Robotics On-Device, DeepMind introduces an architecture that operates entirely on local GPUs embedded within robots, supporting latency-sensitive and bandwidth-constrained scenarios like homes, hospitals, and manufacturing floors.

The on-device model retains the core strengths of Gemini Robotics: the ability to understand human instructions, perceive multimodal input (visual and textual), and generate real-time motor actions. It is also highly sample-efficient, requiring only 50 to 100 demonstrations to generalize new skills, making it practical for real-world deployment across varied settings.

Core Features of Gemini Robotics On-Device

    Fully Local Execution: The model runs directly on the robot’s onboard GPU, enabling closed-loop control without internet dependency.Two-Handed Dexterity: It can execute complex, coordinated bimanual manipulation tasks, thanks to its pretraining on the ALOHA dataset and subsequent finetuning.Multi-Embodiment Compatibility: Despite being trained on specific robots, the model generalizes across different platforms including humanoids and industrial dual-arm manipulators.Few-Shot Adaptation: The model supports rapid learning of novel tasks from a handful of demonstrations, dramatically reducing development time.

Real-World Capabilities and Applications

Dexterous manipulation tasks such as folding clothes, assembling components, or opening jars demand fine-grained motor control and real-time feedback integration. Gemini Robotics On-Device enables these capabilities while reducing communication lag and improving responsiveness. This is particularly critical for edge deployments where connectivity is unreliable or data privacy is a concern.

Potential applications include:

SDK and MuJoCo Integration for Developers

Alongside the model, DeepMind has released a Gemini Robotics SDK that provides tools for testing, fine-tuning, and integrating the on-device model into custom workflows. The SDK supports:

The combination of local inference, developer tools, and robust simulation environments positions Gemini Robotics On-Device as a modular, extensible solution for robotics researchers and developers.

Gemini Robotics and the Future of On-Device Embodied AI

The broader Gemini Robotics initiative has focused on unifying perception, reasoning, and action in physical environments. This on-device release bridges the gap between foundational AI research and deployable systems that can function autonomously in the real world.

While large VLA models like Gemini 1.5 have demonstrated impressive generalization across modalities, their inference latency and cloud dependency have limited their applicability in robotics. The on-device version addresses these limitations with optimized compute graphs, model compression, and task-specific architectures tailored for embedded GPUs.

Broader Implications for Robotics and AI Deployment

By decoupling powerful AI models from the cloud, Gemini Robotics On-Device paves the way for scalable, privacy-preserving robotics. It aligns with a growing trend toward edge AI, where computational workloads are shifted closer to data sources. This not only enhances safety and responsiveness but also ensures that robotic agents can operate in environments with strict latency or privacy requirements.

As DeepMind continues to broaden access to its robotics stack—including opening up its simulation platform and releasing benchmarks—researchers worldwide are now better equipped to experiment, iterate, and build reliable, real-time robotic systems.


Check out the Paper and Technical details. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post Google DeepMind Releases Gemini Robotics On-Device: Local AI Model for Real-Time Robotic Dexterity appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Gemini Robotics 机器人 端侧AI
相关文章