热点
关于我们
xx
xx
"
LLM-as-a-Judge
" 相关文章
Evaluate LLMs and RAG a practical example using Langchain and Hugging Face
philschmid RSS feed
2025-09-30T11:11:44.000000Z
Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-Judge)
https://eugeneyan.com/rss
2025-09-30T11:08:32.000000Z
LLM-as-a-Judge: Where Do Its Signals Break, When Do They Hold, and What Should “Evaluation” Mean?
MarkTechPost@AI
2025-09-21T00:44:18.000000Z
大模型给自己当裁判并不靠谱!上海交通大学新研究揭示LLM-as-a-judge机制缺陷
智源社区
2025-08-18T05:01:27.000000Z
LaajMeter: A Framework for LaaJ Evaluation
cs.AI updates on arXiv.org
2025-08-15T04:18:46.000000Z
EvolvR: Self-Evolving Pairwise Reasoning for Story Evaluation to Enhance Generation
cs.AI updates on arXiv.org
2025-08-11T04:08:43.000000Z
Benchmarking Amazon Nova: A comprehensive analysis through MT-Bench and Arena-Hard-Auto
AWS Machine Learning Blog
2025-07-24T18:40:33.000000Z
苹果携手剑桥大学设计最佳 AI 评审框架,突破复杂任务评审局限
IT之家
2025-07-24T03:23:58.000000Z
Evaluating generative AI models with Amazon Nova LLM-as-a-Judge on Amazon SageMaker AI
AWS Machine Learning Blog
2025-07-17T22:16:00.000000Z
Effective cross-lingual LLM evaluation with Amazon Bedrock
AWS Machine Learning Blog
2025-07-08T15:49:18.000000Z
Meta 推出 J1 系列模型:革新 LLM-as-a-Judge,打造最强“AI 法官”
IT之家
2025-05-22T04:23:13.000000Z
Evaluate healthcare generative AI applications using LLM-as-a-judge on AWS
AWS Machine Learning Blog
2025-02-27T17:46:17.000000Z
LLM-as-a-judge on Amazon Bedrock Model Evaluation
AWS Machine Learning Blog
2025-02-12T17:59:47.000000Z
直播|LLM-as-a-Judge热门论文,当AI担任“评估者”综述分享,AI+金融圆桌交流,IDEA研究院
智源社区
2025-01-14T09:20:38.000000Z
直播|LLM-as-a-Judge热门论文,当AI变成“判官”综述分享,AI+金融圆桌交流,IDEA研究院
智源社区
2025-01-14T09:05:19.000000Z
关于LLM-as-a-judge范式,终于有综述讲明白了
机器之心
2024-12-04T06:06:00.000000Z
Evaluating prompts at scale with Prompt Management and Prompt Flows for Amazon Bedrock
AWS Machine Learning Blog
2024-09-05T20:32:19.000000Z