Communications of the ACM - Artificial Intelligence 前天 04:45
AI能否成为超越人类的数学家?
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

普林斯顿大学的 Sanjeev Arora 教授在海德堡获奖者论坛上提出了“超人 AI 数学家”的可能性。他设想,这样的 AI 能够证明比人类已证明的更多的数学定理。这一设想可以追溯到早期自动数学的梦想,并通过现代的 Lean 证明助手得以实现。AI 通过强化学习进行自我改进,利用大量数学数据学习概念,并通过人类反馈或 Lean 自动验证来纠正错误。AI 甚至可能生成有价值的数学问题。DeepMind 的 AlphaGeometry 和 AlphaProof 以及 Princeton 的 Goedel-Prover 等实验已展现出 AI 在解决数学问题上的潜力,部分模型已达到 IMO 金牌水平。AI 创业公司 Morph Labs 也大幅提升了 AI 将英文证明转换为 Lean 的效率。Arora 认为,数学领域因其可验证性,可能是 AI 超级智能首先出现的领域,AI 的不知疲倦将使其不断自我完善,尽管超人 AI 数学家并不意味着完美,而是超越人类。

🧠 超人 AI 数学家的概念:理论计算机科学家 Sanjeev Arora 教授提出,超人 AI 数学家是指能够证明比人类已证明的数学定理数量更多的一种人工智能。这一概念是对早期自动数学梦想的延续,并且得益于现代计算能力和算法的发展。

🛠️ 实现路径与工具:AI 自动证明数学定理的道路可以追溯到 20 世纪初的数学自动化梦想,如今通过形式化证明验证的概念得以实现。开源的证明助手 Lean 扮演了关键角色,它允许将数学证明转化为精确的语言,并由计算机严格校验。未来,AI 将能够自动完成证明的 Lean 语言编写,这是实现超人 AI 数学家的重要一步。

📈 AI 的自我改进与问题生成:AI 的超人表现很可能源于通过强化学习实现的自我改进。AI 通过学习大量数学书籍和论文,理解概念,并利用人类反馈或自动验证(如 Lean)来纠正错误并生成更优的答案。更进一步,AI 甚至可能独立生成有价值的数学问题,这得益于其从海量数据中发现模式并进行创造性组合的能力。

🚀 现有实验与进展:目前已有多个实验证明了这一路径的可行性。DeepMind 的 AlphaGeometry 和 AlphaProof,以及 Princeton 的 Goedel-Prover 等项目,在解决国际数学奥林匹克(IMO)级别的几何和通用数学推理问题上取得了显著成就,部分模型已达到 IMO 金牌水平。AI 初创公司 Morph Labs 发布的 Gauss 工具,也大幅缩短了将复杂数学证明转化为 Lean 的时间。

💡 数学作为 AI 超级智能的先驱领域:Arora 认为,数学领域因其高度可验证性,很可能是 AI 超级智能首先出现的领域。AI 的不知疲倦和持续的自我改进循环,使其能够不断提升能力。这种数学超智能意味着 AI 在数学领域的表现将超越人类,但不代表其在所有领域都完美无缺。

“Will there be a superhuman AI mathematician?” asked theoretical computer scientist professor Sanjeev Arora from Princeton University at the 12th Heidelberg Laureate Forum in September. What would that mean?

Imagine the set of all possible math theorems; only a subset has been proven by human mathematicians. Said Arora, “A superhuman AI mathematician is one that can prove more theorems than humans have.”

Arora, who was awarded the 2011 ACM Prize in Computing for his contributions to computational complexity, algorithms, and optimization, sketched a possible path to a superhuman AI mathematician. He explained that the idea traces back to David Hilbert’s early 20th-century dream of automating mathematics. That dream was crushed by the work of Gödel, Turing, and Church, yet it left behind something lasting: the concept of formal proof verification—the notion that mathematical proofs can be written in a precise language and then rigorously checked by a computer.

The modern open-source programming language and proof assistant Lean is ideally suited for precisely this purpose, Arora said. A proof written in English can be translated into Lean, after which the Lean checker verifies whether or not the proof is correct. “Rewriting the proof in Lean is presently done by humans, but very soon this will be done by AI,” Arora said.

Superhuman performance will probably arise from AI self-improvement through reinforcement learning, Arora said. AI has already been trained on math books and math papers, so it is aware of the definitions of mathematical concepts. “The idea of self-improvement is that you give the AI a large question bank created by humans; it follows many attempts to answer these questions, and the correct answers are used for further training.

“How does it get the correct ones? That’s the human feedback,” Arora said. “Some humans have labeled them as correct answers. This is the present pipeline. But in math, you can verify the answer with Lean. So, if you just ask AI to produce its proofs in Lean, labeling the correct answers can be done automatically. Even if it is a long proof, we humans can trust that it is correct.”

Automating proof-checking is one thing, but what about the mathematical questions that are worth being explored? Do they still need a human mathematician? Said Arora, “There is increasing evidence that AI itself can generate very good questions, so therefore you won’t need the large question bank from humans. Maybe the AI will start with some human data, but after that, it takes off and produces new questions. Why can this work? Well, the strongest suit of AI is its creativity. It is trained on this massive amount of data and then can find things across the data and combine them in interesting ways. It is creative, okay, but we know that it can also hallucinate, right? It doesn’t have a very good idea of truth. But it has Lean to verify answers and weed out the wrong questions and answers.”

This is no longer just theory. Experiments have shown the method explained by Arora can work in practice, albeit at less-than-superhuman level. Arora mentioned the examples of DeepMind’s AlphaGeometry and AlphaProof, and the Goedel-Prover created in Arora’s own Princeton Language and Intelligence lab. In 2023, AlphaGeometry solved International Mathematical Olympiad (IMO)-level geometry problems without needing human demonstrations. AlphaProof focused on more general formal math reasoning, not just geometry, and achieved in 2024 the silver-medal standard on IMO problems. This year, models from OpenAI and Google reached the IMO gold-medal standard.

Whereas the models of OpenAI and Google are secret, Arora’s Princeton lab has released an open-source model, called Goedel-Prover-V2. “We verified that you can use current open-source AI models and iterate through 10-to-20 rounds of self-correction to solve five out of six IMO questions, which is at the gold-medal level,” he said.

Another step paving the way for a superhuman AI mathematician came earlier this year from AI startup Morph Labs, which announced a major improvement in AI translating English proofs into Lean. Mathematicians Terence Tao and Alex Kontorovich have been trying to convert the Strong Prime Number Theorem in Lean, which they thought would take a few years of human effort, but with an AI tool called Gauss, it took Morph Labs only three weeks.

“So, this whole pipeline of AI verifying answers and producing its own questions really is taking off,” Arora said. “If AI superintelligence happens, mathematics is the likely first domain because of the verified answers. The reason it can get better is because it is tireless and it can just keep repeating, do this self-improvement loop and get better and better. Just remember that this math superintelligence does not mean it’s perfect, only that it is better than humans.”

Bennie Mols is a science and technology writer based in Amsterdam, the Netherlands.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI 数学 超级智能 自动化证明 Lean 强化学习 人工智能 AI Mathematics Superintelligence Automated Proof Lean Reinforcement Learning
相关文章