热点
"Evaluation" 相关文章
Webinar recap: Eval best practices
Braintrust Blog 2025-11-05T04:39:32.000000Z
OpenAI、Google、Anthropic 都在做的 “Agent 工具箱” 是什么丨晚点播客
晚点LatePost 2025-10-20T16:32:36.000000Z
AutoCode: A New AI Framework that Lets LLMs Create and Verify Competitive Programming Problems, Mirroring the Workflow of Human Problem Setters
MarkTechPost@AI 2025-10-18T09:11:05.000000Z
LLM-Specific Utility: A New Perspective for Retrieval-Augmented Generation
cs.AI updates on arXiv.org 2025-10-14T04:20:12.000000Z
OBsmith: Testing JavaScript Obfuscator using LLM-powered sketching
cs.AI updates on arXiv.org 2025-10-14T04:17:41.000000Z
How to Evaluate Your RAG Pipeline with Synthetic Data?
MarkTechPost@AI 2025-10-13T21:33:59.000000Z
How to Evaluate Your RAG Pipeline with Synthetic Data?
MarkTechPost@AI 2025-10-13T21:33:59.000000Z
Measuring what matters: An intro to AI evals
Braintrust Blog 2025-10-10T23:05:13.000000Z
Measuring what matters: An intro to AI evals
Braintrust Blog 2025-10-10T23:05:13.000000Z
What It Really Takes to Fine-Tune a LLM Model for a Real-World Use Case
Spritle Blog 2025-10-09T12:40:07.000000Z
企业AI Agent如此困难的真正原因并不是人工智能
36氪 - 科技频道 2025-10-09T07:56:05.000000Z
[职场话题] 我对工作、领导、绩效体系的一些感悟,不知道和大家的体感一不一样
V2EX 2025-09-27T05:02:55.000000Z
三大网络威胁检测厂商退出 MITRE 评估测试
HackerNews 2025-09-23T07:32:31.000000Z
Evals in the Age of Jarvis
少点错误 2025-09-21T20:39:50.000000Z
Evaluating RAG, aka Optimizing the Optimization
n8n Blog 2025-09-18T13:29:00.000000Z
发现一个 AI 大模型服务质量榜单。
掘金 人工智能 2025-09-16T10:59:55.000000Z
打造生产级 AI 智能体系统:来自 Shopify Sidekick 的经验教训 (2025)
宝玉的分享 2025-09-16T03:27:01.000000Z
🕸️ GraphRAG 图数据质量评估:让你的知识图谱不再“翻车”!
掘金 人工智能 2025-09-13T18:32:26.000000Z
智源林咏华:具身智能突破口,在于三大要素|新智元十周年峰会
智源社区 2025-09-13T08:38:35.000000Z
SuperBench|生成式人工智能电子通信应用测评报告
智源社区 2025-09-12T12:13:15.000000Z