热点
"评估框架" 相关文章
RAGalyst: Automated Human-Aligned Agentic Evaluation for Domain-Specific RAG
cs.AI updates on arXiv.org 2025-11-07T05:51:16.000000Z
Opus: A Quantitative Framework for Workflow Evaluation
cs.AI updates on arXiv.org 2025-11-07T05:44:27.000000Z
Scalable Evaluation and Neural Models for Compositional Generalization
cs.AI updates on arXiv.org 2025-11-06T05:22:45.000000Z
Zero-shot data citation function classification using transformer-based large language models (LLMs)
cs.AI updates on arXiv.org 2025-11-06T05:09:06.000000Z
LLM Strategic Reasoning: Agentic Study through Behavioral Game Theory
cs.AI updates on arXiv.org 2025-11-05T05:31:16.000000Z
Speech-DRAME: A Framework for Human-Aligned Benchmarks in Speech Role-Play
cs.AI updates on arXiv.org 2025-11-05T05:30:15.000000Z
PreferThinker: Reasoning-based Personalized Image Preference Assessment
cs.AI updates on arXiv.org 2025-11-05T05:14:16.000000Z
Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models
cs.AI updates on arXiv.org 2025-11-03T05:19:53.000000Z
Vintage Code, Modern Judges: Meta-Validation in Low Data Regimes
cs.AI updates on arXiv.org 2025-11-03T05:19:23.000000Z
LLM-based Multi-class Attack Analysis and Mitigation Framework in IoT/IIoT Networks
cs.AI updates on arXiv.org 2025-11-03T05:18:55.000000Z
CATArena: Evaluation of LLM Agents through Iterative Tournament Competitions
cs.AI updates on arXiv.org 2025-11-03T05:17:15.000000Z
OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs
cs.AI updates on arXiv.org 2025-10-30T04:23:19.000000Z
[职场话题] 现在对 AI 产品经理要求都这么高了?
V2EX 2025-10-30T04:16:00.000000Z
[职场话题] 现在对 AI 产品经理要求都这么高了?
V2EX 2025-10-30T03:56:35.000000Z
[职场话题] 现在对 AI 产品经理要求都这么高了?
V2EX 2025-10-30T03:36:44.000000Z
[职场话题] 现在对 AI 产品经理要求都这么高了?
V2EX 2025-10-30T03:17:24.000000Z
[职场话题] 现在对 AI 产品经理要求都这么高了?
V2EX 2025-10-30T02:58:05.000000Z
[职场话题] 现在对 AI 产品经理要求都这么高了?
V2EX 2025-10-30T02:38:11.000000Z
[职场话题] 现在对 AI 产品经理要求都这么高了?
V2EX 2025-10-30T01:57:50.000000Z
[职场话题] 现在对 AI 产品经理要求都这么高了?
V2EX 2025-10-30T01:38:00.000000Z