Zeroth Principles of AI 09月25日
进化计算被低估
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

进化计算是一种重要的AI领域,常被低估。它作为智能的原始先驱,对理解LLM至关重要。文章定义了SHAG(超级人类答案生成器),涵盖所有LLM及GA、GP、SA等。GA通过竞争、交叉和变异优化解决方案,特别适用于参数复杂且优化目标明确的问题。尽管进化自然耗时,但计算机可快速模拟,经济高效地解决人类难题。

🔹进化计算(EC)是人工智能的早期形式,对理解LLM等现代AI至关重要,但常被忽视。它通过模拟自然进化过程解决复杂问题,是智能发展的基础。

🌱SHAG(超级人类答案生成器)是能生成人类无法计算答案的计算机系统,包括LLM、遗传算法(GA)、遗传编程(GP)和模拟退火(SA)。进化计算是SHAG的核心思想来源。

🧬遗传算法(GA)通过随机初始化种群、定义适应度函数、交叉和变异等操作,迭代优化解决方案。它特别擅长处理参数众多、关系复杂的优化问题,且计算速度快,经济高效。

⚡进化计算的速度问题:尽管自然进化耗时漫长,但计算机可并行模拟进化过程,达到每秒数百万甚至数十亿次的迭代速度。通过多线程和云计算,GA能快速找到近似最优解。

💡SHAG的应用价值:能解决人类无法处理的难题,如缺乏完整输入数据的问题、NP-Hard和NP-Complete问题(如背包问题)。对于重复性问题,GA可能比LLM成本百万倍更低,具有显著经济优势。

Evolutionary Computation is under-appreciated. Objections include "Evolution takes millennia" and "I don't get the point".

They are (or should be) very important to people in the AI community because they are a primitive precursor to intelligence. Understanding Evolutionary Computation (EC) will make understanding LLMs easier.

In order to not have to define "AI" again, I will now define a new term SHAG which stands for "SuperHuman Answer Generator".

A SHAG is a computer based system which can provide answers to questions that the client or programmer that is using the system cannot (or cannot be bothered to) compute an answer to themselves. Or shorter: A SHAG generates answers that humans cannot generate.

Unsurprisingly, all LLMs are SHAGs. They can generate answers that their programmers could not. Such as understand and reply in Finnish.

To many, it is more surprising that all SHAGs (and hence all LLMs) are Holistic. Since SHAGs are Holistic we can identify several problems shared by all Holistic systems (discussed at length in my Red Pill):

- The answers provided may not be correct
- The answers provided are not known to be optimal, complete, repeatable, parsimonious, explainable, or transparent.

As evidence, we recognize these problems as endemic to current LLMs.

But are there SHAGs that are not LLMs? Besides current LLMs (including my own Deep-Discrete-Neuron-Network LLMs), Genetic Algorithms (GA), Genetic Programming (GP), and simulated annealing (SA) are SHAGs. I will mainly discuss GAs in what follows.

My favorite way of using GAs:

- Define an individual that defines a solution but has to compete for survival, initialized randomly for diversity.
- Create a population of those and store all of them in an Array. Say, 1000 individuals in the population.
- Define a goal function that returns a number indicating how good (fit) the individual is.
- Define an crossover function that breeds two successful individuals together, hoping to create an even better offspring
- Define a mutation function that reintroduces more diversity into some individuals.

Loop until system stops improving, which is detectable by noticing that the elite is not changing. This might take as little as 10 cycles for well behaved problems:

- Compute fit for each individual using the goal function
- Sort the array by fitness of the individuals
- Start with the worst individual and move towards better ones:
- Replace the individual with crossover of two superior individuals
- Stop replacements a bit below the top (to preserve the "elite")

Design Overview

GAs all use individuals containing a genome of some sort. Part of the design challenge for the crossover and the goal function is that the practitioner needs to understand the problem well enough to determine not only which individual (viewed as a solution) is better, but also to be able to determine the parameters that comprise all solutions.

There really is not a Phenotype in simple GAs. We evaluate the genotype directly using the goal function. This is a pretty radical shortcut but one that is actually starting to get used in wetlab genomics: Labs make grains with better yields without the bother to grow the seeds for a year, because they know what a higher yield DNA genome looks like.

Suppose you wanted to use GA to optimize shipping cost to design a square cornered box big enough for 200lbs of grain (with a known average density) as cheaply as possible. The genome would contain the X, Y, and Z sizes of the box and those would initially be totally randomized in each individual. The goal function returns zero fitness for every box with insufficient volume for the required amount of grain and otherwise returns the length and circumference so that we can compute the shipping cost the traditional way.

The crossover function might take two parents and use X from one and Y and Z from the other, and sometimes X and Y from one and Z from the other. All parents have better fit than the replaced individual had; we are hoping recombination produces an even better offspring by b enefiting from partial solutions in the parents.

Very rapidly we will note that the best boxes in each generation become smaller and smaller and cheaper to ship. When the Elite is stable, they all have the same X, Y, and Z which is the optimal solution.

Now consider a larger problem with 500 numerical parameters and a goal function that uses every single one of them. This may be expensive to evolve, but if it is the only way forward, we'll take it. Well-behaved problems will converge rapidly.

A typical individual would keep those 500 values in an array (much like genes in a chromosome), and crossover would brutally create the new individual using “DNA” segments from one of the parent alternating with segments from the other parent at one or more randomly chosen cut points.

Some Design Details

    Mutation is MUCH LESS important than crossover to the point of being optional. Beginners get this backwards and even textbooks fail to emphasize this enough. If you are not using crossover, then you are just using random search and are discarding the entire point and power of GA.
    In a population of 1000 I might use an elite of 10 and will therefore replace 990 worst individuals with potentially superior offspring each turn through loop. I generally apply mutation to at most a couple percent of all individuals.

    There has to be some chance for the offspring to inherit some feature(s) of what made the parent(s) successful. I've seen beginners make crossover functions that mistakenly discard all history from the parents and hence the system degrades to random search. This matters for things like deciding on cut points (when using array representations), where some properties will tend to travel together for synergy reasons.

    We can turn this mistake into a measurement. In order to test that your GA works, compare convergence of your full GA with convergence of a version where you replace the crossover function with just creation of a new random (starting point) individual without history from the parents. Now you have a random search system. If it isn't significantly slower than your fully functioning GA, then you need to go back to the drawing board. It will also show you how much an EV can speed things up over linear or random search (those two are equivalent).

GAs Are Not Slow

We can now deal with the objection that "Evolution takes millennia". It does, in nature, especially if you only get one chance to create offspring per year. Computers do it faster.

A modern CPU runs at 3GHz -- 3E9 cps, which is 3,000,000,000 clock cycles per second.

Suppose we have a problem where computing the goal function value takes 1000 clock cycles per individual. This is often generous. A GA with a population of 1000 can therefore run 3,000 generations per second... per thread. If we are in a hurry, we can use multithreading and there are special versions of GA frameworks that can run on a cloud.

So speed is not the issue.

What’s the point?

As to the other objection: The point of SHAGs is that they can provide answers problems humans can't solve, including problems without reliable and complete input data, and NP-Hard and NP-Complete problems such as knapsack problems. These are discussed in The Red Pill.

GAs shine in situations where many parameters influence the outcome in complicated (even complex) ways, and where nobody knows how to find an optimal answer, but where we can rather cheaply determine how good the answer represented by any individual’s genome is.

If you are in this situation, you might do well to explore Holistic Methods, and you need to know that a GA may sometimes be an option. A GA may be a million times cheaper than an LLM for many matched tasks, which becomes economically important for frequently occurring problems.

So SHAGs in computers are LLMs and other (future?) AIs, GA, GP, and SA. But the biggest SHAG of all is Darwinian Evolution of Species in Nature. Humans certainly could not make a platypus from scratch, but Evolution did. More about this in the next article.

Bottom line

In order to understand how LLMs are solving problems we humans cannot solve ourselves (e.g. Protein Folding) we should first study the simpler case of Genetic Algorithms.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

进化计算 SHAG 遗传算法 LLM 人工智能
相关文章