热点
"监督微调" 相关文章
Thinking Machine新研究刷屏!结合RL+微调优势,小模型训练更具性价比了
智源社区 2025-10-29T07:36:31.000000Z
刚刚,Thinking Machines Lab博客提出在策略蒸馏,Qwen被cue 38次
机器之心 2025-10-28T05:42:24.000000Z
BugPilot: Complex Bug Generation for Efficient Learning of SWE Skills
cs.AI updates on arXiv.org 2025-10-28T04:07:16.000000Z
刚刚,Thinking Machines Lab博客提出在策略蒸馏,Qwen被cue 38次
36氪 AI 2025-10-28T02:04:10.000000Z
Self-Rewarding PPO: Aligning Large Language Models with Demonstrations Only
cs.AI updates on arXiv.org 2025-10-27T06:23:29.000000Z
Teaching Language Models to Reason with Tools
cs.AI updates on arXiv.org 2025-10-24T04:26:47.000000Z
大模型微调范式认知再被颠覆?UIUC、Amazon团队最新研究指出SFT灾难性遗忘问题或被误解
机器之心 2025-10-21T08:56:03.000000Z
大模型微调范式认知再被颠覆?UIUC、Amazon团队最新研究指出SFT灾难性遗忘问题或被误解
机器之心 2025-10-21T06:37:50.000000Z
大模型微调范式认知再被颠覆?UIUC、Amazon团队最新研究指出SFT灾难性遗忘问题或被误解
机器之心 2025-10-21T06:37:48.000000Z
Utility-Diversity Aware Online Batch Selection for LLM Supervised Fine-tuning
cs.AI updates on arXiv.org 2025-10-21T04:27:27.000000Z
Thinking About Thinking: Evaluating Reasoning in Post-Trained Language Models
cs.AI updates on arXiv.org 2025-10-21T04:23:24.000000Z
VERITAS: Leveraging Vision Priors and Expert Fusion to Improve Multimodal Data
cs.AI updates on arXiv.org 2025-10-20T04:08:46.000000Z
VERITAS: Leveraging Vision Priors and Expert Fusion to Improve Multimodal Data
cs.AI updates on arXiv.org 2025-10-20T04:08:46.000000Z
RAG、Search Agent不香了?苹果DeepMMSearch-R1杀入多模态搜索新战场
机器之心 2025-10-17T06:46:11.000000Z
Analyzing and Internalizing Complex Policy Documents for LLM Agents
cs.AI updates on arXiv.org 2025-10-14T04:10:49.000000Z
Adaptive Dual Reasoner: Large Reasoning Models Can Think Efficiently by Hybrid Reasoning
cs.AI updates on arXiv.org 2025-10-14T04:08:53.000000Z
听说,大家都在梭后训练?最佳指南来了
机器之心 2025-10-09T08:30:03.000000Z
听说,大家都在梭后训练?最佳指南来了
机器之心 2025-10-09T04:21:27.000000Z
Training Large Language Models To Reason In Parallel With Global Forking Tokens
cs.AI updates on arXiv.org 2025-10-08T04:08:21.000000Z
Training Large Language Models To Reason In Parallel With Global Forking Tokens
cs.AI updates on arXiv.org 2025-10-08T04:08:21.000000Z