热点
关于我们
xx
xx
"
DPO
" 相关文章
Reg-DPO: SFT-Regularized Direct Preference Optimization with GT-Pair for Improving Video Generation
cs.AI updates on arXiv.org
2025-11-05T05:30:33.000000Z
ICCV 2025 | 港科、牛津大学发布AlignGuard,文图生成模型可规模化安全对齐框架
机器之心
2025-10-30T09:51:34.000000Z
LLM Training Data Optimization: Fine-Tuning, RLHF & Red Teaming
Cogito Tech
2025-10-23T05:35:13.000000Z
LLM Training Data Optimization: Fine-Tuning, RLHF & Red Teaming
Cogito Tech
2025-10-23T05:35:13.000000Z
RLHF in 2024 with DPO and Hugging Face
philschmid RSS feed
2025-09-30T11:11:16.000000Z
Noteworthy AI Research Papers of 2024 (Part One)
Ahead of AI
2025-09-25T10:01:35.000000Z
Diversity First, Quality Later: A Two-Stage Assumption for Language Model Alignment
cs.AI updates on arXiv.org
2025-08-15T04:18:21.000000Z
基于ERNIE-4.5-0.3B医疗领域大模型一站式分布式训练部署
掘金 人工智能
2025-08-05T15:20:55.000000Z
Learning to Align Human Code Preferences
cs.AI updates on arXiv.org
2025-07-29T04:22:14.000000Z
Unlearning of Knowledge Graph Embedding via Preference Optimization
cs.AI updates on arXiv.org
2025-07-29T04:21:39.000000Z
Customize Amazon Nova in Amazon SageMaker AI using Direct Preference Optimization
AWS Machine Learning Blog
2025-07-23T19:09:14.000000Z
英伟达大牛主讲!斯坦福吴恩达:大语言模型的后训练课程全网发布
Datawhale
2025-07-10T16:57:13.000000Z
AI产品经理必看的大模型微调劝退指南丨实战笔记
掘金 人工智能
2025-07-09T03:14:27.000000Z
怎么通俗易懂地理解AI大模型微调?一篇大白话文章解释模型微调!
掘金 人工智能
2025-06-18T08:03:15.000000Z
Pre-Training、Fine-Tuning、SFT、LoRA、RLHF之间有什么关系?
掘金 人工智能
2025-05-29T07:43:06.000000Z
Fine-tune large language models with reinforcement learning from human or AI feedback
AWS Machine Learning Blog
2025-04-04T14:45:37.000000Z
小长假AI进化营|3天掌握大模型对齐核心技术
智源社区
2025-04-03T06:37:40.000000Z
LLM自学成才变身「预言家」!预测未来能力大幅提升
智源社区
2025-02-26T03:37:13.000000Z
以小博大,微软Phi-4正式发布~
PaperAgent
2024-12-14T09:18:53.000000Z
LLM 就是内容创作者快速通向 O(n!) 表达之路 | 本质和形式系列-叙事结构@散沙
ShowMeAI
2024-11-13T18:54:01.000000Z