热点
"KL散度" 相关文章
Resampling Conserves Redundancy & Mediation (Approximately) Under the Jensen-Shannon Divergence
少点错误 2025-10-31T01:16:12.000000Z
Sculpting Latent Spaces With MMD: Disentanglement With Programmable Priors
cs.AI updates on arXiv.org 2025-10-15T04:53:35.000000Z
Sculpting Latent Spaces With MMD: Disentanglement With Programmable Priors
cs.AI updates on arXiv.org 2025-10-15T04:53:35.000000Z
Deceptive Exploration in Multi-armed Bandits
cs.AI updates on arXiv.org 2025-10-13T04:13:21.000000Z
Unlocking Reasoning Capabilities in LLMs via Reinforcement Learning Exploration
cs.AI updates on arXiv.org 2025-10-07T04:15:51.000000Z
交叉熵:深度学习中最常用的损失函数
掘金 人工智能 2025-09-17T09:55:57.000000Z
研究人员提出新型表示学习框架,填补深度学习模型缺乏因果刻画的空白
DeepTech深科技 2025-09-15T14:09:06.000000Z
西交利物浦大学 | 针对大型语言模型的目标导向生成式提示注入攻击
安全学术圈 2025-09-11T20:14:02.000000Z
SFT真不如RL?MIT团队抛出“RL的剃刀”,砍掉遗忘直通终身学习
PaperWeekly 2025-09-11T19:36:22.000000Z
SFT真不如RL?MIT团队抛出“RL的剃刀”,砍掉遗忘直通终身学习
PaperWeekly 2025-09-11T10:55:06.000000Z
SFT远不如RL?永不过时的剃刀原则打开「终身学习」大模型训练的大门
机器之心 2025-09-11T04:10:33.000000Z
西交利物浦大学 | 针对大型语言模型的目标导向生成式提示注入攻击
安全学术圈 2025-08-27T15:46:33.000000Z
Selective Generalization: Improving Capabilities While Maintaining Alignment
少点错误 2025-07-16T21:37:00.000000Z
Off-Policy Reinforcement Learning RL with KL Divergence Yields Superior Reasoning in Large Language Models
MarkTechPost@AI 2025-06-02T04:56:04.000000Z
$500 + $500 Bounty Problem: An (Approximately) Deterministic Maximal Redund Always Exists
少点错误 2025-05-06T23:07:26.000000Z
【带你读】花书《深度学习》导读 第三章 概率与信息论 下
虎扑-热帖 2024-11-24T19:35:16.000000Z
如何准确且可解释地评估大模型量化效果?
智源社区 2024-08-10T08:07:28.000000Z
Beyond Accuracy: Evaluating LLM Compression with Distance Metrics
MarkTechPost@AI 2024-07-18T11:03:46.000000Z