警惕过度自信：从密码学到AI伦理的经验教训

Published on November 12, 2025 10:17 PM GMT

One day, when I was an intern at the cryptography research department of a large software company, my boss handed me an assignment to break a pseudorandom number generator passed to us for review. Someone in another department invented it and planned to use it in their product, and wanted us to take a look first. This person must have had a lot of political clout or was especially confident in himself, because he refused the standard advice that anything an amateur comes up with is very likely to be insecure and he should instead use one of the established, off the shelf cryptographic algorithms, that have survived extensive cryptanalysis (code breaking) attempts.

My boss thought he had to demonstrate the insecurity of the PRNG by coming up with a practical attack (i.e., a way to predict its future output based only on its past output, without knowing the secret key/seed). There were three permanent full time professional cryptographers working in the research department, but none of them specialized in cryptanalysis of symmetric cryptography (which covers such PRNGs) so it might have taken them some time to figure out an attack. My time was obviously less valuable and my boss probably thought I could benefit from the experience, so I got the assignment.

Up to that point I had no interest, knowledge, or experience with symmetric cryptanalysis either, but was still able to quickly demonstrate a clean attack on the proposed PRNG, which succeeded in convincing the proposer to give up and use an established algorithm. Experiences like this are so common, that everyone in cryptography quickly learns how easy it is to be overconfident about one's own ideas, and many viscerally know the feeling of one's brain betraying them with unjustified confidence. As a result, "don't roll your own crypto" is deeply ingrained in the culture and in people's minds.

If only it was so easy to establish something like this in "applied philosophy" fields, like AI alignment! Alas, unlike in cryptography, it's rarely possible to come up with "clean attacks" that clearly show that a philosophical idea is wrong or broken. The most that can usually be hoped for is to demonstrate some kind of implication that is counterintuitive or contradicts other popular ideas. But due to "one man's modus ponens is another man's modus tollens", if someone is sufficiently willing to bite bullets, then it's impossible to directly convince them that they're wrong (or should be less confident) this way. This is made even harder because, unlike in cryptography, there are no universally accepted "standard libraries" of philosophy to fall back on. (My actual experiences attempting this, and almost always failing, are another reason why I'm so pessimistic about AI x-safety, even compared to most other x-risk concerned people.)

So I think I have to try something more meta, like drawing the above parallel with how easy is it to be overconfident in other fields, such as cryptography. Another meta line of argument is to consider how many people have strongly held, but mutually incompatible philosophical positions. Behind a veil of ignorance, wouldn't you want everyone to be less confident in their own ideas? Or think "This isn't likely to be a subjective question like morality/values might be, and what are the chances that I'm right and they're all wrong? If I'm truly right why can't I convince most others of this? Is there a reason or evidence that I'm much more rational or philosophically competent than they are?"

Unfortunately I'm pretty unsure any of these meta arguments will work either. If they do change anyone's minds, please let me know in the comments or privately. Or if anyone has better ideas for how to spread a meme of "don't roll your own metaethics"^[1], please contribute. And of course counterarguments are welcome too, e.g., if people rolling their own metaethics is actually good, in a way that I'm overlooking.

^{^}
To preempt a possible misunderstanding, I don't mean "don't try to think up new metaethical ideas", but instead "don't be so confident in your ideas that you'd be willing to deploy them in a highly consequential way, or build highly consequential systems that depend on them in a crucial way". Similarly "don't roll your own crypto" doesn't mean never try to invent new cryptography, but rather don't deploy it unless there has been extensive review, and consensus that it is likely to be secure.

Discuss

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签