热点
"转向向量" 相关文章
Steering Evaluation-Aware Language Models To Act Like They Are Deployed
cs.AI updates on arXiv.org 2025-10-24T04:28:29.000000Z
One-shot steering vectors cause emergent misalignment, too
少点错误 2025-04-14T06:47:24.000000Z
SAE features for refusal and sycophancy steering vectors
少点错误 2024-10-12T15:08:34.000000Z
ARENA4.0 Capstone: Hyperparameter tuning for MELBO + replication on Llama-3.2-1b-Instruct
少点错误 2024-10-05T11:38:03.000000Z
Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs
少点错误 2024-07-25T16:06:31.000000Z
I found >800 orthogonal "write code" steering vectors
少点错误 2024-07-15T19:20:38.000000Z