热点
关于我们
xx
xx
"
MMLU-PRO
" 相关文章
When to Reason: Semantic Router for vLLM
cs.AI updates on arXiv.org
2025-10-13T04:13:13.000000Z
When to Reason: Semantic Router for vLLM
cs.AI updates on arXiv.org
2025-10-13T04:13:13.000000Z
Pretraining on the Test Set Is No Longer All You Need: A Debate-Driven Approach to QA Benchmarks
cs.AI updates on arXiv.org
2025-07-24T05:31:26.000000Z
大模型权威测试被曝翻车!更偏袒GPT-4等闭源模型,连提示词都区别对待
智源社区
2024-07-12T07:35:55.000000Z
MMLU-Pro: An Enhanced Benchmark Designed to Evaluate Language Understanding Models Across Broader and More Challenging Tasks
MarkTechPost@AI
2024-06-06T07:01:04.000000Z