热点
关于我们
xx
xx
"
评估协议
" 相关文章
Benchmarking World-Model Learning
cs.AI updates on arXiv.org
2025-10-23T04:11:44.000000Z
Trustworthy Retrosynthesis: Eliminating Hallucinations with a Diverse Ensemble of Reaction Scorers
cs.AI updates on arXiv.org
2025-10-14T04:18:54.000000Z
LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation
cs.AI updates on arXiv.org
2025-08-12T04:02:23.000000Z
Reliable Evaluation Protocol for Low-Precision Retrieval
cs.AI updates on arXiv.org
2025-08-06T04:38:47.000000Z
Unifying Post-hoc Explanations of Knowledge Graph Completions
cs.AI updates on arXiv.org
2025-08-01T04:08:36.000000Z
JWB-DH-V1: Benchmark for Joint Whole-Body Talking Avatar and Speech Generation Version 1
cs.AI updates on arXiv.org
2025-07-29T04:21:30.000000Z
Small Edits, Big Consequences: Telling Good from Bad Robustness in Large Language Models
cs.AI updates on arXiv.org
2025-07-23T04:03:12.000000Z
Metric assessment protocol in the context of answer fluctuation on MCQ tasks
cs.AI updates on arXiv.org
2025-07-22T04:34:20.000000Z
PhysGym: Benchmarking LLMs in Interactive Physics Discovery with Controlled Priors
cs.AI updates on arXiv.org
2025-07-22T04:34:13.000000Z