Evaluation_Fishai

热点

"Evaluation" 相关文章

Webinar recap: Eval best practices

Braintrust Blog 2025-11-05T04:39:32.000000Z

OpenAI、Google、Anthropic 都在做的 “Agent 工具箱” 是什么丨晚点播客

晚点LatePost 2025-10-20T16:32:36.000000Z

AutoCode: A New AI Framework that Lets LLMs Create and Verify Competitive Programming Problems, Mirroring the Workflow of Human Problem Setters

MarkTechPost@AI 2025-10-18T09:11:05.000000Z

LLM-Specific Utility: A New Perspective for Retrieval-Augmented Generation

cs.AI updates on arXiv.org 2025-10-14T04:20:12.000000Z

OBsmith: Testing JavaScript Obfuscator using LLM-powered sketching

cs.AI updates on arXiv.org 2025-10-14T04:17:41.000000Z

How to Evaluate Your RAG Pipeline with Synthetic Data?

MarkTechPost@AI 2025-10-13T21:33:59.000000Z

How to Evaluate Your RAG Pipeline with Synthetic Data?

MarkTechPost@AI 2025-10-13T21:33:59.000000Z

Measuring what matters: An intro to AI evals

Braintrust Blog 2025-10-10T23:05:13.000000Z

Measuring what matters: An intro to AI evals

Braintrust Blog 2025-10-10T23:05:13.000000Z

What It Really Takes to Fine-Tune a LLM Model for a Real-World Use Case

Spritle Blog 2025-10-09T12:40:07.000000Z

企业AI Agent如此困难的真正原因并不是人工智能

36氪 - 科技频道 2025-10-09T07:56:05.000000Z

[职场话题] 我对工作、领导、绩效体系的一些感悟，不知道和大家的体感一不一样

V2EX 2025-09-27T05:02:55.000000Z

三大网络威胁检测厂商退出 MITRE 评估测试

HackerNews 2025-09-23T07:32:31.000000Z

Evals in the Age of Jarvis

少点错误 2025-09-21T20:39:50.000000Z

Evaluating RAG, aka Optimizing the Optimization

n8n Blog 2025-09-18T13:29:00.000000Z

发现一个 AI 大模型服务质量榜单。

掘金人工智能 2025-09-16T10:59:55.000000Z

打造生产级 AI 智能体系统：来自 Shopify Sidekick 的经验教训 (2025)

宝玉的分享 2025-09-16T03:27:01.000000Z

🕸️ GraphRAG 图数据质量评估：让你的知识图谱不再“翻车”！

掘金人工智能 2025-09-13T18:32:26.000000Z

智源林咏华：具身智能突破口，在于三大要素｜新智元十周年峰会

智源社区 2025-09-13T08:38:35.000000Z

SuperBench｜生成式人工智能电子通信应用测评报告

智源社区 2025-09-12T12:13:15.000000Z

Copyright © 2019 FISHAI.All Rights Reserved