模型解释性_Fishai

热点

"模型解释性" 相关文章

最具争议性研究：大模型中间层输出可 100% 反推原始输入

AI科技评论 2025-11-02T18:14:01.000000Z

BlackboxNLP-2025 MIB Shared Task: Improving Circuit Faithfulness via Better Edge Selection

cs.AI updates on arXiv.org 2025-10-31T04:03:55.000000Z

你的输入，LLM一字未忘：Transformer被证明“几乎处处可逆”

PaperWeekly 2025-10-30T11:33:00.000000Z

LLM Hallucinations: An Internal Tug of War

少点错误 2025-10-30T05:21:30.000000Z

Learning to Interpret Weight Differences in Language Models

少点错误 2025-10-23T04:23:32.000000Z

Activation Plateaus: Where and How They Emerge

少点错误 2025-10-17T05:51:41.000000Z

Activation Plateaus: Where and How They Emerge

少点错误 2025-10-17T05:51:41.000000Z

Symbol Grounding in Neuro-Symbolic AI: A Gentle Introduction to Reasoning Shortcuts

cs.AI updates on arXiv.org 2025-10-17T04:09:43.000000Z

Benchmark Profiling: Mechanistic Diagnosis of LLM Benchmarks

cs.AI updates on arXiv.org 2025-10-03T04:11:13.000000Z

AI驱动的供应链管理：需求预测实战指南

掘金人工智能 2025-08-17T10:17:57.000000Z

Can Multitask Learning Enhance Model Explainability?

cs.AI updates on arXiv.org 2025-08-12T04:39:14.000000Z

Explaining GPT-2-Small Forward Passes with Edge-Level Autoencoder Circuits

少点错误 2025-07-22T20:37:39.000000Z

Simply reverse engineering gpt2-small (Layer 0, Part 1: Attention)

少点错误 2025-07-22T15:04:02.000000Z

Black-box interpretability methodology blueprint: Probing runaway optimisation in LLMs

少点错误 2025-06-22T18:17:34.000000Z

Can We Really Trust AI’s Chain-of-Thought Reasoning?

Unite.AI 2025-05-24T16:52:33.000000Z

Some OthelloGPT Circuits

少点错误 2025-04-15T21:37:45.000000Z

Enumerating objects a model "knows" using entity-detection features.

少点错误 2025-03-30T20:47:52.000000Z

Learning Multi-Level Features with Matryoshka SAEs

少点错误 2024-12-19T16:01:41.000000Z

The ‘strong’ feature hypothesis could be wrong

少点错误 2024-08-02T14:36:30.000000Z

通过AI寻找科学真理，距离我们还有多远？对话深度原理创始人、新神经网络架构KAN作者｜DeepTalk播客更新

MIT 科技评论 - 本周热榜 2024-07-14T16:01:53.000000Z

Copyright © 2019 FISHAI.All Rights Reserved