LLM-as-a-Judge_Fishai

热点

"LLM-as-a-Judge" 相关文章

Evaluate LLMs and RAG a practical example using Langchain and Hugging Face

philschmid RSS feed 2025-09-30T11:11:44.000000Z

Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-Judge)

https://eugeneyan.com/rss 2025-09-30T11:08:32.000000Z

LLM-as-a-Judge: Where Do Its Signals Break, When Do They Hold, and What Should “Evaluation” Mean?

MarkTechPost@AI 2025-09-21T00:44:18.000000Z

大模型给自己当裁判并不靠谱！上海交通大学新研究揭示LLM-as-a-judge机制缺陷

智源社区 2025-08-18T05:01:27.000000Z

LaajMeter: A Framework for LaaJ Evaluation

cs.AI updates on arXiv.org 2025-08-15T04:18:46.000000Z

EvolvR: Self-Evolving Pairwise Reasoning for Story Evaluation to Enhance Generation

cs.AI updates on arXiv.org 2025-08-11T04:08:43.000000Z

Benchmarking Amazon Nova: A comprehensive analysis through MT-Bench and Arena-Hard-Auto

AWS Machine Learning Blog 2025-07-24T18:40:33.000000Z

苹果携手剑桥大学设计最佳 AI 评审框架，突破复杂任务评审局限

IT之家 2025-07-24T03:23:58.000000Z

Evaluating generative AI models with Amazon Nova LLM-as-a-Judge on Amazon SageMaker AI

AWS Machine Learning Blog 2025-07-17T22:16:00.000000Z

Effective cross-lingual LLM evaluation with Amazon Bedrock

AWS Machine Learning Blog 2025-07-08T15:49:18.000000Z

Meta 推出 J1 系列模型：革新 LLM-as-a-Judge，打造最强“AI 法官”

IT之家 2025-05-22T04:23:13.000000Z

Evaluate healthcare generative AI applications using LLM-as-a-judge on AWS

AWS Machine Learning Blog 2025-02-27T17:46:17.000000Z

LLM-as-a-judge on Amazon Bedrock Model Evaluation

AWS Machine Learning Blog 2025-02-12T17:59:47.000000Z

直播｜LLM-as-a-Judge热门论文，当AI担任“评估者”综述分享，AI+金融圆桌交流，IDEA研究院

智源社区 2025-01-14T09:20:38.000000Z

直播｜LLM-as-a-Judge热门论文，当AI变成“判官”综述分享，AI+金融圆桌交流，IDEA研究院

智源社区 2025-01-14T09:05:19.000000Z

关于LLM-as-a-judge范式，终于有综述讲明白了

机器之心 2024-12-04T06:06:00.000000Z

Evaluating prompts at scale with Prompt Management and Prompt Flows for Amazon Bedrock

AWS Machine Learning Blog 2024-09-05T20:32:19.000000Z

Copyright © 2019 FISHAI.All Rights Reserved