AI安全：应对致命三重奏

https://simonwillison.net/atom/everything 09月30日 19:09

AI安全：应对致命三重奏

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

本文探讨了如何应对AI系统的“致命三重奏”问题，即LLM的非确定性、安全漏洞和攻击者的能力。文章指出，与传统的确定性软件开发不同，AI安全需要借鉴物理工程的思路，通过容错和冗余设计来确保系统安全。文章强调，在应用安全中，99%的安全等级等于失败，攻击者总能找到那1%的漏洞。因此，唯一可靠的方法是切断“致命三重奏”中的一环，而最容易切断的是LLM代理传输 stolen data的能力。

🔍 AI的非确定性本质：与传统的确定性软件开发不同，LLM的非确定性要求安全设计必须考虑容错和冗余，类似于19世纪维多利亚时代的工程师在材料不确定性下建造建筑的方式。

🛡️ 安全漏洞的不可靠性：在应用安全中，99%的安全等级等于失败，攻击者总能找到那1%的漏洞。这与传统软件中漏洞被修复后即消失的思维方式不同。

🚫 切断致命三重奏的一环：文章指出，唯一可靠的方法是切断“致命三重奏”中的一环，而最容易切断的是LLM代理传输 stolen data的能力，从而防止数据泄露。

🔐 物理工程的启示：AI安全应借鉴物理工程的思路，通过容错和冗余设计来确保系统安全，而不是依赖于更多的训练数据或更聪明的系统提示。

🎯 攻击者的能力：文章强调，攻击者总能找到系统中的漏洞，因此必须采取更严格的安全措施，而不是依赖概率性的安全策略。

How to stop AI’s “lethal trifecta” (via) This is the second mention of the lethal trifecta in the Economist in just the last week! Their earlier coverage was Why AI systems may never be secure on September 22nd - I wrote about that here, where I called it "the clearest explanation yet I've seen of these problems in a mainstream publication".

I like this new article a lot less.

It makes an argument that I mostly agree with: building software on top of LLMs is more like traditional physical engineering - since LLMs are non-deterministic we need to think in terms of tolerances and redundancy:

The great works of Victorian England were erected by engineers who could not be sure of the properties of the materials they were using. In particular, whether by incompetence or malfeasance, the iron of the period was often not up to snuff. As a consequence, engineers erred on the side of caution, overbuilding to incorporate redundancy into their creations. The result was a series of centuries-spanning masterpieces.
AI-security providers do not think like this. Conventional coding is a deterministic practice. Security vulnerabilities are seen as errors to be fixed, and when fixed, they go away. AI engineers, inculcated in this way of thinking from their schooldays, therefore often act as if problems can be solved just with more training data and more astute system prompts.

My problem with the article is that I don't think this approach is appropriate when it comes to security!

As I've said several times before, In application security, 99% is a failing grade. If there's a 1% chance of an attack getting through, an adversarial attacker will find that attack.

The whole point of the lethal trifecta framing is that the only way to reliably prevent that class of attacks is to cut off one of the three legs!

Generally the easiest leg to remove is the exfiltration vectors - the ability for the LLM agent to transmit stolen data back to the attacker.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签