热点
"医学基准测试" 相关文章
The Illusion of Readiness: Stress Testing Large Frontier Models on Multimodal Medical Benchmarks
cs.AI updates on arXiv.org 2025-10-02T04:19:09.000000Z