SureSim：提升机器人策略真实世界评估的框架

cs.AI updates on arXiv.org 10月07日

SureSim：提升机器人策略真实世界评估的框架

本文提出SureSim框架，通过结合大规模模拟与少量真实世界测试，为机器人策略的真实世界性能提供可靠推断。该框架将真实与模拟评估的合并问题形式化为预测驱动的推断问题，并利用非渐进均值估计算法提供策略性能的置信区间。

arXiv:2510.04354v1 Announce Type: cross Abstract: Rapid progress in imitation learning, foundation models, and large-scale datasets has led to robot manipulation policies that generalize to a wide-range of tasks and environments. However, rigorous evaluation of these policies remains a challenge. Typically in practice, robot policies are often evaluated on a small number of hardware trials without any statistical assurances. We present SureSim, a framework to augment large-scale simulation with relatively small-scale real-world testing to provide reliable inferences on the real-world performance of a policy. Our key idea is to formalize the problem of combining real and simulation evaluations as a prediction-powered inference problem, in which a small number of paired real and simulation evaluations are used to rectify bias in large-scale simulation. We then leverage non-asymptotic mean estimation algorithms to provide confidence intervals on mean policy performance. Using physics-based simulation, we evaluate both diffusion policy and multi-task fine-tuned (\pi_0) on a joint distribution of objects and initial conditions, and find that our approach saves over (20-25\%) of hardware evaluation effort to achieve similar bounds on policy performance.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

机器人策略真实世界评估模拟测试性能推断 SureSim

相关文章

Octo: An Open-Sourced Large Transformer-based Generalist Robot Policy Trained on 800k Trajectories from the Open X-Embodiment Dataset

韩国将在韩元交易时间延长前开展最后测试

通用机器人里程碑！MIT提出策略组合框架PoCo，解决数据源异构难题，实现机器人多任务灵活执行

Precision home robots learn with real-to-sim-to-real

解锁具身 Scaling Law 需要先搞定异构数据吗？

Google DeepMind Researchers Propose RT-Affordance: A Hierarchical Method that Uses Affordances as an Intermediate Representation for Policies

把RLHF带给VLA模型！通过偏好对齐来优化机器人策略，代码已开源

Coval evaluates AI voice and chat agents like self-driving cars

Detecting AI Agent Failure Modes in Simulations

空间具身通用操作模型！百万真实数据训练，预训练代码全开源 | 上海AI Lab/TeleAI/上科大等团队新作