热点
"模型解释性" 相关文章
最具争议性研究:大模型中间层输出可 100% 反推原始输入
AI科技评论 2025-11-02T18:14:01.000000Z
BlackboxNLP-2025 MIB Shared Task: Improving Circuit Faithfulness via Better Edge Selection
cs.AI updates on arXiv.org 2025-10-31T04:03:55.000000Z
你的输入,LLM一字未忘:Transformer被证明“几乎处处可逆”
PaperWeekly 2025-10-30T11:33:00.000000Z
LLM Hallucinations: An Internal Tug of War
少点错误 2025-10-30T05:21:30.000000Z
Learning to Interpret Weight Differences in Language Models
少点错误 2025-10-23T04:23:32.000000Z
Activation Plateaus: Where and How They Emerge
少点错误 2025-10-17T05:51:41.000000Z
Activation Plateaus: Where and How They Emerge
少点错误 2025-10-17T05:51:41.000000Z
Symbol Grounding in Neuro-Symbolic AI: A Gentle Introduction to Reasoning Shortcuts
cs.AI updates on arXiv.org 2025-10-17T04:09:43.000000Z
Benchmark Profiling: Mechanistic Diagnosis of LLM Benchmarks
cs.AI updates on arXiv.org 2025-10-03T04:11:13.000000Z
AI驱动的供应链管理:需求预测实战指南
掘金 人工智能 2025-08-17T10:17:57.000000Z
Can Multitask Learning Enhance Model Explainability?
cs.AI updates on arXiv.org 2025-08-12T04:39:14.000000Z
Explaining GPT-2-Small Forward Passes with Edge-Level Autoencoder Circuits
少点错误 2025-07-22T20:37:39.000000Z
Simply reverse engineering gpt2-small (Layer 0, Part 1: Attention)
少点错误 2025-07-22T15:04:02.000000Z
Black-box interpretability methodology blueprint: Probing runaway optimisation in LLMs
少点错误 2025-06-22T18:17:34.000000Z
Can We Really Trust AI’s Chain-of-Thought Reasoning?
Unite.AI 2025-05-24T16:52:33.000000Z
Some OthelloGPT Circuits
少点错误 2025-04-15T21:37:45.000000Z
Enumerating objects a model "knows" using entity-detection features.
少点错误 2025-03-30T20:47:52.000000Z
Learning Multi-Level Features with Matryoshka SAEs
少点错误 2024-12-19T16:01:41.000000Z
The ‘strong’ feature hypothesis could be wrong
少点错误 2024-08-02T14:36:30.000000Z
通过AI寻找科学真理,距离我们还有多远?对话深度原理创始人、新神经网络架构KAN作者|DeepTalk播客更新
MIT 科技评论 - 本周热榜 2024-07-14T16:01:53.000000Z