MarkTechPost@AI 03月14日
Google DeepMind’s Gemini Robotics: Unleashing Embodied AI with Zero-Shot Control and Enhanced Spatial Reasoning
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Google DeepMind推出Gemini Robotics,是基于Gemini 2.0的模型套件,带来范式转变,具有多种先进能力,开启机器人新时代,且注重安全。

🎈Gemini Robotics是先进的VLA模型,引入物理动作作为直接输出模态,使机器人自主执行任务

💪具有多种关键技术进步,如无与伦比的通用性、直观的交互性、高级的灵活性和多样的适应性

🌟Gemini Robotics-ER提升空间推理能力,实现机器人高精度和高效率执行任务

🚀Gemini 2.0能实现零和少样本机器人控制,通过单一模型完成多种任务

Google DeepMind has shattered conventional boundaries in robotics AI with the unveiling of Gemini Robotics, a suite of models built upon the formidable foundation of Gemini 2.0. This isn’t just an incremental upgrade; it’s a paradigm shift, propelling AI from the digital realm into the tangible world with unprecedented “embodied reasoning” capabilities.

Gemini Robotics: Bridging the Gap Between Digital Intelligence and Physical Action

At the heart of this innovation lies Gemini Robotics, an advanced vision-language-action (VLA) model that transcends traditional AI limitations. By introducing physical actions as a direct output modality, Gemini Robotics empowers robots to autonomously execute tasks with a level of understanding and adaptability previously unattainable. Complementing this is Gemini Robotics-ER (Embodied Reasoning), a specialized model engineered to refine spatial understanding, enabling roboticists to seamlessly integrate Gemini’s cognitive prowess into existing robotic architectures.

These models herald a new era of robotics, promising to unlock a diverse spectrum of real-world applications. Google DeepMind’s strategic partnerships with industry leaders like Apptronik, for the integration of Gemini 2.0 into humanoid robots, and collaborations with trusted testers, underscore the transformative potential of this technology.

Key Technological Advancements:

Gemini Robotics-ER: Pioneering Spatial Intelligence

Gemini Robotics-ER elevates spatial reasoning, a critical component for effective robotic operation. By enhancing capabilities such as pointing, 3D object detection, and spatial understanding, this model enables robots to perform tasks with heightened precision and efficiency.

Gemini 2.0: Enabling Zero and Few-Shot Robot Control

A defining feature of Gemini 2.0 is its ability to facilitate zero and few-shot robot control. This eliminates the need for extensive robot action data training, enabling robots to perform complex tasks “out of the box.” By uniting perception, state estimation, spatial reasoning, planning, and control within a single model, Gemini 2.0 surpasses previous multi-model approaches.

Below is the perception and control APIs, and agentic orchestration during an episode. This system is used for zero-shot control:

Commitment to Safety 

Google DeepMind prioritizes safety through a multi-layered approach, addressing concerns from low-level motor control to high-level semantic understanding. The integration of Gemini Robotics-ER with existing safety-critical controllers and the development of mechanisms to prevent unsafe actions underscore this commitment.

The release of the ASIMOV dataset and the framework for generating data-driven “Robot Constitutions” further demonstrates Google DeepMind’s dedication to advancing robotics safety research.

Intelligent robots are getting closer…


Check out  the full Gemini Robotics report and Gemini Robotics. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 80k+ ML SubReddit.

The post Google DeepMind’s Gemini Robotics: Unleashing Embodied AI with Zero-Shot Control and Enhanced Spatial Reasoning appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Gemini Robotics 机器人AI 空间推理 零样本控制
相关文章