热点
关于我们
xx
xx
"
真实世界评估
" 相关文章
Reliable and Scalable Robot Policy Evaluation with Imperfect Simulators
cs.AI updates on arXiv.org
2025-10-07T04:16:54.000000Z
TRUEBench: Can LLM Response Meet Real-world Constraints as Productivity Assistant?
cs.AI updates on arXiv.org
2025-09-30T04:03:07.000000Z