热点
"真实世界评估" 相关文章
Reliable and Scalable Robot Policy Evaluation with Imperfect Simulators
cs.AI updates on arXiv.org 2025-10-07T04:16:54.000000Z
TRUEBench: Can LLM Response Meet Real-world Constraints as Productivity Assistant?
cs.AI updates on arXiv.org 2025-09-30T04:03:07.000000Z