热点
"人类评估" 相关文章
Contrastive Decoding Mitigates Score Range Bias in LLM-as-a-Judge
cs.AI updates on arXiv.org 2025-10-22T04:20:38.000000Z
Evaluate LLMs and RAG a practical example using Langchain and Hugging Face
philschmid RSS feed 2025-09-30T11:11:44.000000Z
Creativity Benchmark: A benchmark for marketing creativity for LLM models
cs.AI updates on arXiv.org 2025-09-15T08:15:24.000000Z
Estimating Facial Attractiveness Prediction for Livestreams
Unite.AI 2025-01-08T14:46:44.000000Z
Meta 发布视频生成和编辑模型,来看看项目负责人的论文导读
AIGC Weekly 2024-12-22T16:39:14.000000Z
Chatbot Arena Conversation Dataset Release
2024-10-02T06:00:21.000000Z
o1谎称自己没有CoT?清华UC伯克利:RLHF让模型学会撒谎摸鱼,伪造证据PUA人类
36kr 2024-09-23T09:19:07.000000Z