热点
"评估任务" 相关文章
Breaking the Mirror: Activation-Based Mitigation of Self-Preference in LLM Evaluators
cs.AI updates on arXiv.org 2025-09-05T04:45:38.000000Z
Findings of the BEA 2025 Shared Task on Pedagogical Ability Assessment of AI-powered Tutors
cs.AI updates on arXiv.org 2025-07-16T04:28:43.000000Z