热点
关于我们
xx
xx
"
离线评估
" 相关文章
Measuring what matters: How offline evaluation of GitHub MCP Server works
The GitHub Blog
2025-10-30T22:00:37.000000Z
Reinforcement Learning for Recommendations and Search
https://eugeneyan.com/rss
2025-09-30T11:12:03.000000Z
The Inadequacy of Offline LLM Evaluations: A Need to Account for Personalization in Model Behavior
cs.AI updates on arXiv.org
2025-09-25T05:37:50.000000Z