Interpretability_Fishai

热点

"Interpretability" 相关文章

OpenAI新论文拆解语言模型内部机制：用「稀疏电路」解释模型行为

机器之心 2025-11-14T14:08:09.000000Z

OpenAI把Transformer训练成「几乎全零」，黑箱首次被彻底揭开

PaperWeekly 2025-11-14T11:48:12.000000Z

Self-interpretability: LLMs can describe complex internal processes that drive their decisions

少点错误 2025-11-14T00:24:40.000000Z

Weight-sparse transformers have interpretable circuits

少点错误 2025-11-13T18:36:46.000000Z

OpenAI’s new LLM exposes the secrets of how AI really works

MIT Technology Review » Artificial Intelligence 2025-11-13T18:29:53.000000Z

【ICML25】使用信息瓶颈理论为点云模型进行错误归因，为安全问题构建可解释工具

复旦白泽战队 2025-11-12T07:40:37.000000Z

3A大作！阿里ROLL团队从基建->算法->机理，推动RL4LLM全栈协同优化

机器之心 2025-11-10T08:31:44.000000Z

3A大作！阿里ROLL团队从基建->算法->机理，推动RL4LLM全栈协同优化

机器之心 2025-11-10T07:27:28.000000Z

Can Models be Evaluation Aware Without Explicit Verbalization?

少点错误 2025-11-08T19:24:30.000000Z

Bottom-Up: Principled Compression to Shrink LLMs

少点错误 2025-11-08T11:56:07.000000Z

Toward Statistical Mechanics Of Interfaces Under Selection Pressure

少点错误 2025-11-06T23:30:01.000000Z

【ICML25】使用信息瓶颈理论为点云模型进行错误归因，为安全问题构建可解释工具

复旦白泽战队 2025-11-03T13:33:05.000000Z

The Future of Interpretability is Geometric

少点错误 2025-10-24T18:56:29.000000Z

Learning to Interpret Weight Differences in Language Models

少点错误 2025-10-23T04:23:32.000000Z

Activation Plateaus: Where and How They Emerge

少点错误 2025-10-17T05:51:41.000000Z

Activation Plateaus: Where and How They Emerge

少点错误 2025-10-17T05:51:41.000000Z

Finding Features in Neural Networks with the Empirical NTK

少点错误 2025-10-16T18:21:44.000000Z

大语言模型的遗忘也许是一个悖论？清华团队揭示Unlearning技术的两难困境

MIT 科技评论 - 本周热榜 2025-10-15T10:52:12.000000Z

Training Qwen-1.5B with a CoT legibility penalty

少点错误 2025-10-09T21:48:46.000000Z

Training Qwen-1.5B with a CoT legibility penalty

少点错误 2025-10-09T21:48:46.000000Z

Copyright © 2019 FISHAI.All Rights Reserved