热点
"Interpretability" 相关文章
OpenAI新论文拆解语言模型内部机制:用「稀疏电路」解释模型行为
机器之心 2025-11-14T14:08:09.000000Z
OpenAI把Transformer训练成「几乎全零」,黑箱首次被彻底揭开
PaperWeekly 2025-11-14T11:48:12.000000Z
Self-interpretability: LLMs can describe complex internal processes that drive their decisions
少点错误 2025-11-14T00:24:40.000000Z
Weight-sparse transformers have interpretable circuits
少点错误 2025-11-13T18:36:46.000000Z
OpenAI’s new LLM exposes the secrets of how AI really works
MIT Technology Review » Artificial Intelligence 2025-11-13T18:29:53.000000Z
【ICML25】使用信息瓶颈理论为点云模型进行错误归因,为安全问题构建可解释工具
复旦白泽战队 2025-11-12T07:40:37.000000Z
3A大作!阿里ROLL团队从基建->算法->机理,推动RL4LLM全栈协同优化
机器之心 2025-11-10T08:31:44.000000Z
3A大作!阿里ROLL团队从基建->算法->机理,推动RL4LLM全栈协同优化
机器之心 2025-11-10T07:27:28.000000Z
Can Models be Evaluation Aware Without Explicit Verbalization?
少点错误 2025-11-08T19:24:30.000000Z
Bottom-Up: Principled Compression to Shrink LLMs
少点错误 2025-11-08T11:56:07.000000Z
Toward Statistical Mechanics Of Interfaces Under Selection Pressure
少点错误 2025-11-06T23:30:01.000000Z
【ICML25】使用信息瓶颈理论为点云模型进行错误归因,为安全问题构建可解释工具
复旦白泽战队 2025-11-03T13:33:05.000000Z
The Future of Interpretability is Geometric
少点错误 2025-10-24T18:56:29.000000Z
Learning to Interpret Weight Differences in Language Models
少点错误 2025-10-23T04:23:32.000000Z
Activation Plateaus: Where and How They Emerge
少点错误 2025-10-17T05:51:41.000000Z
Activation Plateaus: Where and How They Emerge
少点错误 2025-10-17T05:51:41.000000Z
Finding Features in Neural Networks with the Empirical NTK
少点错误 2025-10-16T18:21:44.000000Z
大语言模型的遗忘也许是一个悖论?清华团队揭示Unlearning技术的两难困境
MIT 科技评论 - 本周热榜 2025-10-15T10:52:12.000000Z
Training Qwen-1.5B with a CoT legibility penalty
少点错误 2025-10-09T21:48:46.000000Z
Training Qwen-1.5B with a CoT legibility penalty
少点错误 2025-10-09T21:48:46.000000Z