Communications of the ACM - Artificial Intelligence 08月18日
Will AI Destroy the World Wide Web?
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

万维网在20世纪90年代中期兴起,改变了信息传播方式。如今,以ChatGPT为代表的生成式人工智能(GenAI)的出现,再次引发了一场革命,它能高质量地生成自然语言文本。文章探讨了GenAI对以广告为主要收入模式的Web搜索生态的潜在威胁。当用户可以直接从聊天机器人获取信息,而无需访问网页时,依赖广告收入的搜索引擎将面临严峻挑战。更深层次的担忧在于,如果用户不再访问网页,网页开发者发布内容的动力将减弱,而这些公共网页恰恰是训练GenAI模型所需的重要数据来源。如果LLM被用于生成训练数据,可能导致模型“塌陷”,使其效用大打。文章还提到了“enshittification”现象,即平台通过降低用户价值来获取利益,暗示Web生态可能正在经历这一过程。

💡 生成式AI(GenAI)的崛起正威胁着以广告为基础的万维网(Web)搜索模式。过去,用户通过搜索访问网页以获取信息,而广告商则通过Google Ads支付费用。但现在,GenAI可以直接提供答案,绕过了网页访问环节,这可能削弱广告商的付费意愿,从而动摇Web的商业模式。

🌐 GenAI的普及可能导致网页开发者发布内容的动力下降。由于GenAI能够轻易生成文本,大量公共网页内容可能由AI生成,而这些AI生成的内容恰恰是训练大型语言模型(LLMs)的关键数据。如果LLMs被训练于AI生成的内容,可能导致“模型塌陷”,即模型出现不可逆的缺陷,无法有效运作。

📉 GenAI的易用性可能导致内容质量的整体下降,即所谓的“enshittification”。当人们倾向于使用GenAI而非原创写作时,网络上的信息可能充斥着AI生成的、可能包含错误的文本。这使得用户更难找到高质量、可靠的信息,并可能促使他们不再倾向于点击链接访问原始网页。

🔄 GenAI对Web的颠覆性影响尚不明朗,其发展方向取决于内容生成模式。如果AI生成的内容能与人类原创内容互补,Web生态或许能保持活力。但如果AI内容完全取代人类创作,则可能加速Web价值的侵蚀,甚至引发“模型塌陷”的风险,最终影响GenAI自身的发展。

The World Wide Web (Web) emerged as a new medium in the mid-1990s. It was invented by Tim Berners-Lee at the European Organization for Nuclear Research (CERN) in 1989, but its exploding popularity was also enabled by the release of the Mosaic Web browser in 1993 and the Internet becoming commercially available in 1995. A communication revolution was launched.

Roughly 30 years later, the release of ChatGPT by OpenAI in Nov. 2022 launched another revolution. High-quality generation of natural-language text, defined as the hallmark of intelligence by Alan Turing in 1950, is suddenly widely available. I wonder, however, if the generative AI (GenAI) revolution will end up devouring the Web revolution.

As I pointed outa in 2018, the anti-establishment zeitgeist of the 1960s led to the dogma of “Information wants to be free.” Thus, Google, which defines search on the Web, uses advertising as its main revenue engine. Google sends users to Web pages, without charge, that link to Google Ads paid for by advertisers. But users typically want information not Web pages. Web pages are just a means for getting information.

Now, however, I can ask a chatbot the same questions I used to ask a search engine—for instance, “How can I balance the wheels of my car?”—and I will get a detailed answer without going to a web page on wheel balancing. In fact, many Google searches now display “AI Overview” at the top of the results page, obviating the need to visit a webpage.

But if users lose the motivation to visit Web pages, then advertisers lose the motivation to pay for Google Ads. If we can find information by asking GenAI, who needs the Web? While the ecosystem in which the Web thrived had one colossal flaw, namely, Surveillance Capitalism,b it had a stable business model. That business model is now being threatened by GenAI.

But the threat of GenAI goes deeper than the threat to advertising-supported Web search. If users lose the motivation to visit Web pages, then Web-page developers lose the motivation to post Web pages. Yet, public Web pages are one major source of data to train the large language models (LLMs) underlying GenAI. Without public Web pages, it would be much more difficult to train LLMs.

And the risk does not go away even if Web-page developers continue to post public pages. As the biblical phrase “In the sweat of thy face shalt thou eat bread” suggests, humans do not like to work hard. Writing is also hard work, not physically, but mentally. But now we have GenAI! People are increasingly using GenAI to create text: It is so much easier than writing. For example, I asked ChatGPT “Would AI destroy the World-Wide Web?” and it replied “Not likely—but it could erode its value unless regulated and used responsibly” and offered a detailed analysis.

I am less optimistic than ChatGPT. It is so easy to generate text using GenAI that people will invariably generate text for public Web pages using GenAI. But, as pointed out earlier, public Web pages are the raw data for training LLMs. What happens when LLMs are trained on LLM-generated texts? This topic was addressed in a July 2024 Nature paper titled “AI Models Collapse when Trained on Recursively Generated Data.”c “We find that indiscriminate use of model-generated content in training causes irreversible defects in the resulting models, in which tails of the original content distribution disappear,” reported the authors. In other words, an LLM-generated Web is useless in providing data to train LLMs. A counterargumentd is that model collapse is not inevitable. It can be avoided if LLM-generated content is added to human-generated content rather than replaces it. Are we in an incremental content regime or a replacement content regime? Only time will tell.

Cory Doctorow coined the phrase “enshittificatione to describe the business strategy by online platforms of hooking users with tempting products, only to degrade them later by shifting value away from users. Is the Web undergoing enshittification? The Web became immensely useful because many people generated content, and Google found the best-quality content in response to our questions. But the current quality of GenAI-generated answers is such that people argue that “The Entire Internet Is Reverting to Beta.”f In fact, at the bottom of each AI-generated answer Google adds the disclaimer, in a small font, “AI responses may include mistakes.” I have to teach my students to always click the link and go to the source, but I doubt many people do that.

Will AI destroy the Web? Even ChatGPT agrees it could erode its value unless regulated and used responsibly.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

生成式AI 万维网 ChatGPT AI伦理 信息传播
相关文章