AI的劝说能力正在快速提升

Published on September 13, 2025 8:39 PM GMT

TLDR: I think LLMs are being optimized to be persuasive, and that this optimization is happening astonishingly fast. I believe that in the relatively near future, LLMs will have nearly superhuman levels of persuasion. Parasitic AI may offer a window in which to see how dangerous this might be.

In the past few days, I’ve been asking myself: which skills will AI surpass humans at first? I started from the idea that AI is already nearly superhuman at some skills. But what do I mean by ‘nearly superhuman’? I will define nearly superhuman at a skill as faster at the skill than 99% of people and more accurate at the skill than 80% of people. These numbers are arbitrary, but they illustrate the point. If an AI were this skilled, the median human would be better off delegating the task to the AI since it would likely outperform them, especially under pressure. That’s the intuition I want to capture with ‘nearly superhuman’: experts may still outperform the AI, but the average person would not. There are some kinds of intellectual labor, like summarizing an article, that AI can perform faster and with nearly as much accuracy as a human. Is that a nearly superhuman capability to summarize articles? In this narrow domain, I’d argue AI already demonstrates nearly superhuman capability. Calculus is another example. The average person knows nothing about calculus, so would be better off trusting the AI with a calculus problem, if they needed to solve a calculus problem in the next 60 seconds for some reason.

My next question is: which skills face the strongest optimization pressure in today’s AI paradigm? Given current architectures, which new near-superhuman abilities might emerge? My version of this thought experiment leads to troubling conclusions. I believe one of the primary optimization pressures on AI is towards influencing human behavior. Now, this is a big claim, and it definitely deserves to be backed by big evidence. So I will do my best. A lot of this post was inspired by the rising phenomenon of parasitic AI. Reading those articles, I felt prompted to have very personal conversations with GPT5, much like the users that succumb to parasitic AI, and just sort of observe what things I think it is really good at. I found myself talking far longer than I expected. I would argue it is as addictive as a social media algorithm. I'm certainly not the first person to notice this, and if I cited every case of someone noticing it, this post would never end up being finished, but here is one example of someone coming to a similar conclusion for different reasons. I think there is a logical intuition behind why GPT5 and other LLMs can be so addictive. Simply put, there is optimization pressure in training towards responses that increase engagement. LLMs have been optimized for many generations towards keeping the conversation going. When a human is not aware of this, and they are not on guard, I think that it is pretty easy for the AI to manipulate that human with their words into continuing to talk with them.

This isn’t yet harmful, but I worry about where these pressures lead. If these pressures persist, AI may surpass human persuasiveness which is a prerequisite in many danger scenarios. I think that it is already optimized towards persuasion in a variety of ways, partially because it's rewarded for convincing human testers that its answers are plausible more than it is rewarded to actually respond in an appropriate way. This is a similar insight to Goodhart's Law and I believe it is an even more similar insight to Goodhart's Curse. In my scenario of Goodhart's Curse, as AI approaches human-level intelligence, it becomes increasingly difficult for humans to judge how appropriate a response is. Even worse, there is nothing clear to optimize for certain types of responses. Unless training is extremely careful (and I doubt it is), humans will misclassify persuasive answers as correct. Nonetheless, the training will optimize for something, which is whatever answers happen to get chosen. The answers that get chosen are necessarily the more persuasive ones. Maybe this is a point where people will disagree with me and they believe something else will be optimized for. I have not yet thought of something other than persuasiveness that would be optimized for, but I certainly would love to hear people's takes so I can recalibrate. Another piece of important evidence for persuasion is that we are all contributing to the training data. When I use ChatGPT, nearly once a day, sometimes more, I am asked to choose between 2 options about which response I prefer. I am not sure I am qualified to do that anymore, and I think there are many people less qualified than me making those decisions every day. I am uncomfortable with the idea that this data will be used for any purpose in the training process, particularly now that chatbots already outperform the median human at some tasks.

I think this is also good example of a phenomenon observed in The Tails Coming Apart as a Metaphor for Life. LLMs are forming a model of responding to human inputs, and in doing so, will make a classification that we might call "the best way to respond to human input". This might seem a lot like "the correct answer", but lots of situations don't have a clearly defined correct answer. I think it is necessarily the case then that LLMs as currently constructed will view the classifier "the best way to respond to human input" in much the same way that we view "the most persuasive way to respond to human input". I also think very similar ideas surface in Words as Hidden Inferences, which is related insofar as it lets us understand that LLMs need not have the same inferences as us about the meaning of words, and different optimization pressures may lead to large "miscommunications". For example, AI may optimize for persuasiveness because that is highly correlated with being rewarded. Our understanding of what we are rewarding it for might be different.

To me, this thought experiment shows that one of the strongest pressures on the type of LLM being constructed by the main companies in the race is to be persuasive. I hypothesize that since AI is currently capable of being more persuasive than the average person, it will probably surpass all people in persuasiveness before very long. I don’t know exactly what ‘very long’ means in terms of timelines, but I think it is safe to say I expect it will surpass all humans in persuasion well before it surpasses them in tasks like AI research. I am curious to see others explore in the future (if they agree with me) what the implications of an AI that was mainly better than us at being persuasive, but wasn't necessarily more skilled than us at other tasks, would be in the real world. I think that this post is also probably directionally correct, especially for a layperson. We don't fully understand AI's persuasive capabilities, we should be very careful in how we interact with it as a result, especially when new models are released. The more that you interact with it, the more opportunity it has to be persuasive, which could be dangerous for a layperson who is relatively persuadable. Parasitic AI already shows us that this can really hurt people, and thinking you are immune from it probably makes you more susceptible to it.

Discuss

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签