Communications of the ACM - Artificial Intelligence 11月03日 00:55
图灵测试的局限性:75年后,专家呼吁淘汰
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

在图灵发表其著名论文75周年之际,一场汇聚了计算机科学家、心理学家和哲学家的会议认为,图灵测试已不再是衡量人工智能的有效标准。与会者指出,人类容易被AI欺骗,导致测试结果失真。从早期专家系统到如今的大型语言模型(LLMs),图灵测试一直被用作评价AI智能的标杆。然而,正如ELIZA聊天机器人的早期案例所示,即使是简单的程序也能诱发用户的“错觉性思维”,使其相信AI具有智能。如今的LLMs也延续了这种“ELIZA效应”,尽管它们本质上是复杂的序列预测机器,但仍能欺骗用户,从而影响人们对AGI(通用人工智能)的判断,并带来潜在的社会风险。

💡 **图灵测试的过时性:** 75年后,专家们普遍认为图灵测试已不再是衡量人工智能智能的有效标准。它最初旨在评估机器模仿人类智能的能力,但如今,无论是早期专家系统还是先进的大型语言模型(LLMs),都已显示出其局限性。会议指出,人类极易被AI欺骗,使得测试结果的有效性大打折扣,甚至可能产生误导。

🧠 **“ELIZA效应”的持续影响:** 早期心理治疗聊天机器人ELIZA的实验表明,简单的计算机程序也能诱发人类产生“错觉性思维”,让用户误以为机器具有智能。这种现象被称为“ELIZA效应”,并已在当今的大型语言模型(如ChatGPT、Gemini等)中得到延续。这些模型虽然能力强大,但本质上是基于海量数据进行序列预测,而非真正理解或拥有智能,却能通过模仿人类对话来欺骗用户。

⚠️ **潜在的社会风险与误导:** 图灵测试的局限性以及“ELIZA效应”的普遍存在,导致社会对AI的能力产生过高期望,尤其是在AGI(通用人工智能)的讨论中。当AI被误认为具有智能时,可能引发一系列社会风险,包括但不限于法律文件起草的错误、青少年对AI的过度依赖、以及数字文盲父母引导儿童接触不当算法内容等。此外,AI的训练数据如果存在缺陷,也可能导致意想不到的危险,例如自动驾驶汽车在非道路场景下的事故。

In 1950, the British computer pioneer Alan Mathison Turing published a paper in the journal Mind outlining a number of ways in which emerging machine intelligences could be assessed, all of them centered on the broad notion that if an AI can convincingly imitate a human intellect, then it can be regarded as intelligent.

Ever since, the so-called Turing Test embodied in that paper, Computing Machinery And Intelligence, has served as a yardstick by which the smarts of everything from the earliest hard-coded medical expert systems to today’s hallucinating large language models (LLMs), have been measured.

In London in October, at a one-day meeting to observe the 75-year anniversary of the publication of Turing’s landmark paper, a gathering of computer scientists, cognitive psychologists, mathematicians, philosophers, historians, and even tech-savvy musicians agreed that the Turing Test has had its day, and that it is time to retire it as an unhelpful distraction. Why? People are too easily duped into thinking an AI system is intelligent, leaving the test meaningless.

“Extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people.”
–Alan Kay, Turing Award Laureate
Credit: Debbie Rowe / Web Science Institute

“To have people talk about your paper 75 years after you wrote it is pretty damned cool, and to have that meme running for so long means there must be something in it,” said computer scientist Dame Wendy Hall, director of the Web Science Institute at the University of Southampton, U.K., as she kicked off the meeting at London’s Royal Society. “But one of the things I think Turing got wrong was that he overestimated the intelligence of human beings, because it’s incredibly easy to fool people.”  

Data scientist Yannis Ioannidis, president of ACM, concurred, telling delegates that in his experience researchers “don’t worry so much about very advanced artificial intelligence, but about the very low human intelligence” of some users—who, whatever the evidence, simply want to believe the output of AI systems is more truthful than erroneous.

Delusional thinking on the part of AI users is nothing new, said computer science pioneer Alan Kay, who conceptualized the Dynabook, a precursor to today’s GUI-based personal computers, laptops, and tablets. In a keynote, Kay related the story of his late friend, MIT researcher Joe Weizenbaum, who between 1964 and 1967 ran experiments with an early psychotherapist chatbot called ELIZA, which was coded to present canned natural language responses to key words in a patient’s typed inputs. (A public version is available online.)

Turing “overestimated the intelligence of human beings, because it’s incredibly easy to fool people.”
–Dame Wendy Hall, Web Science Institute, University of Southampton
Credit: Debbie Rowe / Web Science Institute

Its predictable responses meant ELIZA failed the Turing Test, but some people believed it was intelligently psychoanalyzing them, and asked to spend time alone with the machine for private mental health consultations, according to Douglas Hofstadter in Gödel, Escher, Bach: an Eternal Gold Braid.

“I knew Joe Weizenbaum, and one of the things he said was that he had not realized that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people. He was shocked,” Kay said.  

Peter Millican, a professor of philosophy at the U.K.’s University of Oxford, agreed that the user experience with ELIZA was a game changer. “It showed that it has turned out to be much easier than people thought it would be to deceive people. The Turing Test is somewhat undermined by that,” he said.

That “ELIZA effect” has not gone away. Today’s chatbots, like Anthropic’s Claude, OpenAI’s ChatGPT, and Google’s Gemini, are very much its heirs and many are treated as reliable sources, even friends.

“LLMs are deeply flawed imitators that are preying on the ELIZA effect.”
–Gary Marcus, cognitive scientist, entrepreneur
Credit: Debbie Rowe / Web Science Institute

“Now we have something else that is also preying on fooling people: the LLM. The chatbot mania we’re experiencing right now represents a profoundly dangerous echo of the ELIZA effect,” said cognitive scientist Gary Marcus, an entrepreneur and critic of companies like OpenAI, whose CEO Sam Altman has claimed that spending trillions of dollars to scale up deep learning foundation models will lead to the emergence of an artificial general intelligence (AGI).

“We as a society are placing truly massive bets around the premise that AGI is close, in no small part, because LLMs, pretty arguably, do pass the Turing Test,” Marcus said. “LLMs can fool people into thinking they’re people. People will talk to those machines, tell them their most private details and so forth, because they have a kind of relationship with those machines. But in reality, LLMs are deeply flawed imitators that are preying on the ELIZA effect.”

Sarah Dillon, a professor at Cambridge University, agreed. “LLMs show that the Turing Test is irrelevant, because they’re just sequence prediction machines processing vast amounts of language and telling you the most obvious thing that’s going to come next.”

The problem, said Dillon, is that Alan Turing never expected his Mind paper to be taken anywhere near so seriously. As evidence she cited the views of Turing’s Ph.D. student, Robin Gandy, who said in an essay (recounted in turn by Millican) that the Mind paper was written quickly and with relish by Turing, who proudly read the punchier excerpts out loud as he wrote it, and considered it a piece of propaganda designed to get the emerging computing sector taken more seriously, rather than a learned test for use in perpetuity.  

Not only does the Mind paper not reference a Turing Test, Dillon pointed out, but in fact includes seven different imitation games in which an interrogator has to guess, variously, who is a woman, who is a man, and who is a universal computing machine, with each possibly attempting to deceive the interrogator, or not. All that passing the test proves, Dillon said, is that a machine can imitate some of the intellectual operations of a human.

“That’s it; it doesn’t prove anything else,” she said.

Researchers “don’t worry so much about very advanced artificial intelligence, but about the very low human intelligence.”
–Yannis Ioannidis, president of ACM
Credit: Debbie Rowe / Web Science Institute

The upshot of the Turing Test leading people to believe an AI may be intelligent presents societal dangers and safety threats across the board, the meeting was told, from law firms drawing up legal briefs using hallucinating LLMs, to teens being encouraged to commit suicide or to trust deep learning models to write their text messages, to a striking fact relayed by Kaitlyn Regehr, Associate Professor of Digital Humanities at University College London, and author of Smartphone Nation, that 81% of children over the age of three are exposed to algorithmic YouTube feeds by digitally illiterate parents.

Marcus related the tale of an autonomous car striking a jet plane on an airport apron, as its training data did not include non-road vehicle avoidance.

In the face of such threats, one idea for a global AI safety regime came from musician and human rights campaigner Peter Gabriel, a regular visitor to Xerox PARC back in the day, said Kay, who noted that the UN’s International Civil Aviation Organization (ICAO) successfully instantiated a global safety regime for the airline industry. “ICAO actually managed to get agreement from 190 countries. Maybe the same thing could be achieved for AI,” he said.

How will governments “create some international regulations to provide safety structures when so many countries are afraid of missing out on the spoils of AI?,” Gabriel asked.

Marcus addressed this point when asked how politicians can learn about the risks of allowing companies to reap the spoils of runaway machine intelligence. “It’s hard to educate people when their wallet is in the way,” Marcus said, especially at a time when “tech CEOs are absolutely convinced that they are gods.”

What is the value of the Turing Test after 75 years? Kay had an interesting response.

“How about a half-life for papers? Many of us have written papers in the past that have survived long past their usefulness, and yet they still are around. This Mind paper is one that really needed a fairly short half life.”

Paul Marks is a technology, aviation, and spaceflight journalist, writer, and editor based in London, U.K.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

图灵测试 人工智能 AI伦理 大型语言模型 ELIZA效应 Turing Test Artificial Intelligence AI Ethics Large Language Models ELIZA Effect
相关文章