Communications of the ACM - Artificial Intelligence 09月25日
生成模型正自我毁灭
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

随着生成式AI的广泛使用,生成内容的质量可能会下降,因为内容将越来越依赖于人工和通用数据。例如,自动生成新图片将基于由人(如摄影师)真实创作的原始图像和机器生成的图像,但后者在对比度和边缘等细节方面不如前者。此外,AI生成的文本将基于真实人物的原创内容加上机器生成的文本,后者可能重复且标准化。由于全球生成数据几乎每三年翻一番,未来人类将产生比以往更多的数据,因此如果互联网被AI生成的内容淹没,那么这些内容将对其结果产生负面影响。AI生成模型使用互联网数据(来自网站、精选内容、论坛和社交媒体)进行训练。人们与这些数据的互动——无论是通过反应、重新发布或认可——都会由于这些内容的来源是非原创的且AI生成的,而使大量不可靠的内容丰富起来。此外,这些互动将被纳入未来的训练集中。这些事实将不利地影响生成模型未来的结果。

🔍 生成式AI模型的质量正在下降,因为它们越来越多地依赖于人工和通用数据。例如,AI生成的图片在细节上不如人类创作的原始图像,而AI生成的文本可能重复且标准化。

🌐 AI模型使用互联网数据进行训练,这些数据包括网站、精选内容、论坛和社交媒体。然而,这些数据中包含大量不可靠的内容,因为它们的来源是非原创的且AI生成的。

🔄 人们与AI生成内容的互动,如反应、重新发布或认可,会使不可靠的内容更加丰富,并会被纳入未来的训练集中,从而进一步影响生成模型的结果。

🎨 AI生成模型在细节上存在缺陷,例如在绘制手、手指、耳朵、牙齿、瞳孔等微小细节时,即使是由熟练艺术家创作的图像也难以实现这些细节。

📈 全球生成数据正在快速增长,预计未来人类将产生比以往更多的数据。如果互联网被AI生成的内容淹没,这些内容将对其结果产生负面影响。

Mario Antoine Aoun
How Generative Models Are Ruining Themselves
DOI:10.1145/3748642
https://bit.ly/4eNTFD0

I argue that with the increased use of generative AI, there will be a decrease in the quality of the generated content because this generated content will be more and more based on artificial and general data.

For instance, automatically generating a new picture will be based on original images authentically generated by persons (such as photographers) plus machine-generated images; however, the latter are not as good as the former in terms of details like contrast and edges. Besides, AI-generated text will be based on original creative content by real persons ‘plus’ machine-generated text, where the latter might be repetitive and standard. Since data generated globally is almost doubling every three years,12 in years to come humanity will produce more data than it has ever created, therefore if the Internet becomes overloaded with AI-generated stuff, then that stuff will affect its (the AI’s) outcome negatively.

AI generative models are trained using Internet data (from sources such as websites, curated content, forums, and social media). People’s interactions with that data—whether by reacting to it, reposting, or endorsing it—will enrich a profusion of unreliable content due to the fact that the origin of such content was unoriginal and AI-generated. Plus, those interactions will be included in future training sets. Those facts will unfavorably influence the results of generative models in the future.

Why and how could this happen? And what can we do about it?

Consider, for example, asking an AI generative model to create an image of the Last Supper. It will successfully do it based on previously encountered paintings of the Last Supper by classical painters. Nonetheless, if we look into the details of any such generated images, we can easily detect discrepancies, specifically in the drawing of hands, fingers, ears, teeth, pupils, and/or other specific tiny prominent details in the foreground, and sometimes in the background. Those details are difficult to realize even by proficient artists.11 Thus, imagine if AI systems are faced with more and more images (photos or paintings) containing unrealistic tiny details due to the difficulty of creating such details or by being filtered or generated using AI, then they will generate outcomes with obvious unrealistic details. This is because generative models are based on artificial neural networks (ANNs) that are essentially function approximators.6 In other words, they are always trying to provide an output based on generalizations they learned from historical inputs. But, this history is continually jeopardized with discrepancies. Better put, generative models are trying to depict reality, but embed glitches from their own inherited generated content. While doing so, their inability to discriminate between efficient and inefficient content makes me argue that they will be inadvertently ruining themselves in the long run.

As previously argued,2 generative models are statistical models lacking creative reasoning capabilities or emergent behaviors. Besides, experiments were done such that the output of an AI system was fed back as its input; after many runs, the system output becomes gibberish.10 In addition, generative models are known to produce emotionless,5 neutral,4 low-perplexity,3 and tedious content.9 Also, according to the adage ‘garbage in garbage out’ (GIGO), the quality of any computing system output is subject to its inputs,1 hence if the system is evolving and learning from less-elegant data, then it will result in less-elegant data. Consequently, the proliferation of trivial generative content by AI models will soon create more boring, emotionless, biased results, flawed with discrepancies and unrealistic details. As I already highlighted, ANNs are prone to inputs and ‘perfect’ in generalizations, thus, through their own generative capabilities, they will be negatively mutating the outcomes they will be offering while endorsing impurities from generation to generation (that is, in version updates and training).

One could argue that generative models are well-suited to providing outstanding results in domains such as law exams, for instance, but it should be noted that this is a narrowed domain of application which is way less in its effect when compared to their applicability on a wide spread of knowledge that they will provide or assist in its generation in the public and private domains. It should also be noted that narrowed-down applications of generative models in specific domains might be useful, but here I am addressing the global impact of such models and their own deterioration in a general and long-term future endeavor. In this regard, the ultimate way to contain such data poisoning (for example, flooding the Internet with degenerate content) should be through awareness and responsible use of generative models. For instance, AI-generated content should not be rushed to be posted online, should be very well refined and, even better, checked or enhanced by experts.

Penrose,8 whilst criticizing AI based on classical computation, was also positive for future technological advancements of AI that would enhance its capabilities.8 Similarly, here, I am criticizing AI based on the current available technologies (such as generative models). If, in the future, a different technology takes the stand, then this might alter my critique.

I conclude with the following challenge for generative models or any future technology: Learning the Mandelbrot set image; an ANN that learns from all Mandelbrot set images available on the Internet will never be able to grasp the complex dynamics behind the countless affinities and similarities that are available in the set.7 In fact, it will provide very similar images of the set on a wider scale, but will be short on the details (for example, the periphery will appear blurred and pixelated when zoomed in, but on a true Mandelbrot set, the periphery is always refined). So, is it possible for a machine, one day, to create, understand, and look at something similar to the Mandelbrot Set, or the Mandelbrot Set itself, the same as Benoit Mandelbrot did and had intuition of, or the way anyone of us feels towards its mathematical beauty?

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

生成式AI 内容质量 数据污染 人工智能模型 互联网
相关文章