热点
"离线评估" 相关文章
Measuring what matters: How offline evaluation of GitHub MCP Server works
The GitHub Blog 2025-10-30T22:00:37.000000Z
Reinforcement Learning for Recommendations and Search
https://eugeneyan.com/rss 2025-09-30T11:12:03.000000Z
The Inadequacy of Offline LLM Evaluations: A Need to Account for Personalization in Model Behavior
cs.AI updates on arXiv.org 2025-09-25T05:37:50.000000Z