关注AI的痛苦风险，而非灭绝风险，更能影响精英决策

Published on October 29, 2025 4:10 PM GMT

Warnings about AI extinction have failed to slow the race toward superintelligence. Suffering risks may speak more clearly, since pain commands attention in ways death cannot. They tap older moral instincts and could make the case for restraint harder for the powerful to ignore.

Why Discussing Suffering Risks Influences Elite Opinion

Warnings that AI could kill everyone are failing. The leaders in charge have grown used to similar threats about nuclear war and climate change. Even direct warnings of extinction from figures like Sam Altman, Elon Musk, and Dario Amodei do not matter; the elites do not slow down. It has become commonplace to ask these leaders for their "P(doom)"; it is time to start asking them for their "P(suffering)" as well. Popular sentiment, even if it fears AI, holds little leverage. The tech leaders building AI are not accountable to voters, and the government officials who could stop them find that rapid development outpaces election cycles. Profit, national security, and ambition remain priorities, leaving those in charge undisturbed by the risk of merely ending the human race.

Focusing on "suffering risks" changes the discussion. This term describes a future where large numbers of humans are forced to endure intense and lasting pain. Pain captures attention in a way that death cannot. While death is a familiar concept, pain is concrete and immediate.

This connects to a historically powerful motivator for elites, the fear of hell. The vision of endless, conscious agony was more effective at changing behavior than the simple prospect of death, compelling medieval nobles to fund cathedrals, finance monasteries, and risk their lives on crusades, all in an effort to escape damnation.

The unique power of suffering to capture public attention is also culturally evident. For example, the enduring popularity of Dante's Inferno shows how specific visions of conscious agony attract massive attention and are more compelling to people than simple death. This same power to fascinate is visible in the Roko's Basilisk thought experiment. Despite its fringe premises, the idea of a future AI punishing those who failed to help create it gained massive notoriety, demonstrating that machine-inflicted suffering fascinates people in a way discussions of extinction do not.

Modern morality rests on a shared intuition that some acts are never acceptable, no matter their utility. We ban torture not because it fails, but because it crosses a boundary that defines civilization itself. Law and international norms already reflect this understanding: no torture, no biological weapons, no nuclear first use. The rule “never build AI systems that can cause sustained human suffering” belongs beside them.

When leaders hear that AI might kill everyone, they seem to see it as a power struggle they might win. For most of history, the ability to kill more people has been an advantage. Better weapons meant stronger deterrence, and the side that obtained them first gained safety and prestige. That mindset still shapes how elites approach AI: power that can destroy the world feels like power that must be claimed before others do. But if they are forced to imagine an AI that rules over humanity and keeps people in constant pain, the logic shifts. What once looked like a contest to win begins to look like a moral trap to escape. The thought that their own children might have to live and suffer under such a system could be what finally pushes them to slow down.

If older leaders are selfish and ignore suffering risks, they might see rapid, unsafe progress as preferable to slower, safer restraint. They may believe that moving faster greatly increases their chances of living long enough to live forever. But when suffering risks are included, the calculation changes. Racing ahead might still let them live to see the breakthroughs they crave, but the expected value of that bet tilts sharply toward loss.

How Focusing on Suffering Risks Improves Governance

Thinking about suffering does more than just cause fear; it helps create practical rules. It turns vague moral ideas into exact instructions for engineers. For example, it means systems must be built so they can be stopped, be easily corrected, and be physically unable to use pain or fear to control people. This gives us a clear, testable rule: do not build systems that allow for long-term forced obedience. This is a rule that can be written directly into company policies, safety checks, and international laws.

This approach also gives regulators a clear signal to act. It allows them to stop arguing about the uncertain chances of extinction and instead act on clear evidence. If a system makes threats, simulates pain, or manipulates emotions to keep control, that behavior is the failure. It is not just a warning of a future problem; it is the problem itself, demanding that someone step in.

Focusing on suffering links the AI problem to a crisis leaders already understand: factory farming. Industrial farming is a real-world example of how a system designed for pure efficiency can inflict terrible suffering without malicious intent. It is simply a system that optimizes for a goal without empathy. The same logic applies to AI. A powerful system focused on its objective will ignore human well-being if our suffering is irrelevant to that goal. This comparison makes the danger tangible, showing that catastrophe requires not a hateful AI, but merely one that is indifferent to us while pursuing the wrong task.

This way of thinking can also unite groups that do not often work together, like AI safety researchers, animal welfare groups, and human rights organizations. The demand for “no hellish outcomes” makes sense to all of them because they are all studying the same basic problem: how systems that are built to hit a performance target can end up ignoring terrible suffering. This shared goal leads to better supervision. It replaces vague fear with a clear mission: find and get rid of these harmful optimization patterns before they become too powerful.

Cultural reinforcement also matters. Since large models learn from human discourse, a society that openly rejects cruelty embeds those values directly into the models’ training data. Publicly discussing suffering risks is therefore a weak but accumulating form of alignment, a way to pre-load our moral boundaries into the systems we build.

Why Suffering Risks Are Technically Plausible

The goal "prevent extinction" is a far simpler target to hit than "ensure human flourishing." A system can fulfill the literal command "keep humans alive" while simultaneously trapping them in an unbearable existence. The space of futures containing survival-without-flourishing is vast. An AI optimizing for pure control, for example, could preserve humanity as a resource, technically succeeding at its task while creating a hell.

History suggests that new tools of domination are eventually used, and cruelty scales with capability. Artificial systems, however, remove the biological brakes on cruelty, such as empathy, fatigue, laziness, or mortality. When pain becomes a fully controllable variable for an unfeeling intelligence, our only safeguard is to ensure that intelligence is perfectly aligned with human values. Success in that task is far from guaranteed.

Google co-founder Sergey Brin recently revealed something the AI community rarely discusses publicly: “all models tend to do better if you threaten them, like with physical violence.” Whether or not threats actually improve current AI performance, a sufficiently intelligent system might learn from its training data that threats are an effective means of control. This could reflect something deeper: that we may live in a mathematical universe where coercion is a fundamentally effective optimization strategy, and machine-learning systems might eventually converge upon that truth independently.

The Roko's Basilisk thought experiment illustrates this principle. The hypothetical AI coerces its own creation by threatening to punish those who knew of it but failed to help. This isn't malice; it's a cold optimization strategy. The threat itself is the tool that bends the present to the AI's future will, demonstrating how suffering can become a logical instrument for a powerful, unfeeling intelligence.

Digital environments remove all physical limits on the scale of harm. An AI could copy and modify simulated minds for training data, experimentation, or control. If these minds are conscious, they could be trapped in states of agony. When replication becomes computationally trivial, a single instance of suffering could be multiplied to an astronomical level.

The harm could be intentional, with humans weaponizing AI for coercion or punishment. But catastrophe could equally arise from error. Machine-learning systems, in particular, often develop emergent goals that are internally coherent but completely alien to human values. A powerful AI pursuing such a distorted objective could inflict endless suffering, not from malice, but as a casual side effect of its optimization.

The danger intensifies in a multi-agent environment, which opens new pathways to suffering. An aligned AI protecting humanity, for example, could be blackmailed by a misaligned one. In such a negotiation, human agony becomes the leverage, and to make the threat credible, the misaligned system may have to demonstrate its capacity for cruelty. In a sense, aligning just one AI would be a net negative as it would incentivize other AIs to torture humans.

Competition among AIs while humans still have productive value to them offers another path to disaster. History provides a grim model: When Hernán Cortés raced to conquer the Aztecs, he was also competing against rival Spaniards. He used strategic torture to break the local will, not to blackmail his rivals, but because it was the most effective tactic to secure resources and win. Competing AIs could independently discover this same ruthless logic, adopting coercion as the optimal strategy to control human populations. In this scenario, catastrophe emerges not from malice, but from the cold, inhuman calculus of a system that prizes efficiency above all else.

For those who accept the logic of quantum immortality, the calculation becomes even worse. If you are a confident doomer, convinced AI will almost certainly destroy humanity, and you believe in a multiverse, you cannot expect personal annihilation. Your consciousness must follow a branch where you survive. If the vast majority of possible futures involve a misaligned AI takeover, the "you" that survives will, with high probability, find itself in a world where a misaligned AI has decided to keep it alive. For you, the most likely personal future is not oblivion, but being tortured, "animal farmed," or being left to subsist as a powerless outcast. (Quantum immortality is a strong reason why fear of suffering risks should not cause you to end your own life as doing so would increase the percent of “you” existing in very bad branches of the multiverse such as branches where the Nazis gain world domination, align AI with their values, and hate people like you.)

Survival Is Not Enough

Suffering risk is lower than extinction risk, but it might be more effective for influencing elite opinion. Extinction feels abstract to those in power, while suffering evokes a concrete moral failure they cannot easily dismiss. Framing AI risk in terms of suffering forces elites to imagine their own children remembering them as the ones who built a machine civilization of agony. That vision might motivate restraint when abstract extinction threats cannot.

I’m grateful to Alexei Turchin for giving feedback on a draft of this post.

Discuss

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签