少点错误 09月23日

AI模型快速迭代带来的潜在风险

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

当前人工智能公司竞相开发下一代模型，速度之快甚至超越了我们对其现有模型理解的程度。这种“模型悬置”现象不容忽视，现有的安全框架也未能充分纳入考量。这源于AI模型本身存在的未被充分发掘的能力，以及业界更倾向于研发新模型而非深入研究现有模型的趋势。这种状况可能导致安全风险评估滞后，使得既定的安全红线或评估标准在实际应用中已显不足。作者对AI发展持保守态度，尤其警惕“开放权重”模式，认为其可能加速风险的显现，并强调了研究未知能力和构建应对回滚机制的重要性。

💡 **AI模型的未知能力：** 现有AI模型，如GPT-3，在其发布时其许多关键能力是未知的。随着研究人员更深入地探索、构建更完善的工具、优化提示词以及检查模型内部结构，关于模型潜力的发现仍在不断涌现。这种持续的探索揭示了模型远超预期的能力边界。

🚀 **技术迭代的权衡：** 当前AI领域的竞争激烈，公司倾向于加速新模型的研发，而非投入大量资源深入研究现有模型。这意味着对最新、最先进（SOTA）模型的优化和推进研究远多于对旧模型的深入分析。如果情况相反，即公司在确信已充分挖掘现有模型潜力后再进行下一代研发，那么潜在的“模型悬置”风险会显著降低。

⚠️ **安全评估的滞后：** “模型悬置”现象对现有的安全框架构成了严峻挑战。诸如设定安全红线、进行能力评估（evals）或制定负责任的发布策略（RSPs）等措施，都基于对模型能力的相对清晰认知。一旦模型存在大量未知且强大的能力，那么在达到某个“红线”时，可能已经为时过晚，因为模型的实际能力早已远超该限制。

📉 **低估未知风险：** 人们普遍倾向于低估未知因素的存在。在AI发展中，这种倾向导致人们倾向于低估“模型悬置”的程度。作者因此对AI的发展和部署持更为保守的态度，并对“先发展到风险过高再停止”的政策表示怀疑。特别是“开放权重”模式，可能加速未知能力的显现和传播，研究和应对其影响至关重要。

Published on September 23, 2025 2:15 PM GMT

By racing to the next generation of models faster than we can understand the current one, AI companies are creating an overhang. This overhang is not visible, and our current safety frameworks do not take it into account.

1) AI models have untapped capabilities

At the time GPT-3 was released, most of its currently-known capabilities were unknown.

As we play more with models, build better scaffolding, get better at prompting, inspect their internals, and study them, we discover more about what's possible to do with them.

This has also been my direct experience studying and researching open-source models at Conjecture.

2) SOTA models have a lot of untapped capabilities

Companies are racing hard.

There's a trade-off between studying existing models and pushing forward. They are doing the latter, and they are doing it hard.

There is much more research into boosting SOTA models than studying any old model like GPT-3 or Llama-2.

To contrast, imagine a world where Deep Openpic decided to start working on the next generation of models only until they were confident that they juiced their existing models. That world would have much less of an overhang.

3) This is bad news.

Many agendas, like red-lines, evals or RSPs, revolve around us not being in an overhang.

If we are in an overhang, then a red-line being met may already be much too late, with untapped capabilities already way past it.

4) This is not accounted for.

It is hard to reason about unknowns in a well-calibrated way.

Sadly, I have found that people consistently have a tendency is to assume that unknowns do not exist.

This means that directionally, I expect people to underestimate overhangs.

This is in great part why...

I am more conservative on AI development and deployment than what the evidence seems to warrant.I am sceptical of any policy of the form "We'll keep pursuing AI until it is clear that it is too risky to keep continuing."I think Open Weight is particularly pernicious.

Sadly, researching this effect is directly capabilities relevant. It is likely that many amplification techniques that work on weaker models would work on newer models too.

Without researching it directly, we may start to feel the existence of an overhang after a pause (whether it is because of a global agreement or a technological slowdown).

Hopefully, at this point, we'd have the collective understanding and infrastructure needed to deal with rollbacks if they were warranted.

Discuss

1) AI models have untapped capabilities

2) SOTA models have a lot of untapped capabilities

3) This is bad news.

4) This is not accounted for.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签