热点
"trl" 相关文章
RL without TD learning
The Berkeley Artificial Intelligence Research Blog 2025-11-07T07:20:30.000000Z
Extended Guide: Instruction-tune Llama 2
philschmid RSS feed 2025-09-30T11:12:11.000000Z
RLHF in 2024 with DPO and Hugging Face
philschmid RSS feed 2025-09-30T11:11:16.000000Z
大模型微调框架之TRL
掘金 人工智能 2025-09-13T18:31:13.000000Z
从原理到实战:RLHF(人类反馈强化学习)完整流程
掘金 人工智能 2025-09-03T06:10:41.000000Z
为视觉语言多模态模型进行偏好优化
智源社区 2024-07-17T05:06:39.000000Z