热点
"评估数据集" 相关文章
MULTI: Multimodal Understanding Leaderboard with Text and Images
cs.AI updates on arXiv.org 2025-10-16T04:31:53.000000Z
Evaluating Long-Context Question & Answer Systems
https://eugeneyan.com/rss 2025-09-30T11:06:49.000000Z
How good are LLMs at Retrieving Documents in a Specific Domain?
cs.AI updates on arXiv.org 2025-09-30T04:02:59.000000Z
Clip Your Sequences Fairly: Enforcing Length Fairness for Sequence-Level RL
cs.AI updates on arXiv.org 2025-09-15T08:12:21.000000Z
自动评估基准 | 一些评估测试集
Hugging Face 2025-01-10T09:00:55.000000Z
自动评估基准 | 一些评估测试集
智源社区 2025-01-09T05:07:26.000000Z
人工评估 | 技巧与提示
智源社区 2024-12-20T03:17:35.000000Z
Marqo Releases Advanced E-commerce Embedding Models and Comprehensive Evaluation Datasets to Revolutionize Product Search, Recommendation, and Benchmarking for Retail AI Applications
MarkTechPost@AI 2024-11-16T06:05:00.000000Z
无一大模型及格! 北大/通研院提出超难基准,专门评估长文本理解生成
智源社区 2024-08-08T14:37:18.000000Z