AI能否成为超越人类的数学家？

“Will there be a superhuman AI mathematician?” asked theoretical computer scientist professor Sanjeev Arora from Princeton University at the 12th Heidelberg Laureate Forum in September. What would that mean?

Imagine the set of all possible math theorems; only a subset has been proven by human mathematicians. Said Arora, “A superhuman AI mathematician is one that can prove more theorems than humans have.”

Arora, who was awarded the 2011 ACM Prize in Computing for his contributions to computational complexity, algorithms, and optimization, sketched a possible path to a superhuman AI mathematician. He explained that the idea traces back to David Hilbert’s early 20th-century dream of automating mathematics. That dream was crushed by the work of Gödel, Turing, and Church, yet it left behind something lasting: the concept of formal proof verification—the notion that mathematical proofs can be written in a precise language and then rigorously checked by a computer.

The modern open-source programming language and proof assistant Lean is ideally suited for precisely this purpose, Arora said. A proof written in English can be translated into Lean, after which the Lean checker verifies whether or not the proof is correct. “Rewriting the proof in Lean is presently done by humans, but very soon this will be done by AI,” Arora said.

Superhuman performance will probably arise from AI self-improvement through reinforcement learning, Arora said. AI has already been trained on math books and math papers, so it is aware of the definitions of mathematical concepts. “The idea of self-improvement is that you give the AI a large question bank created by humans; it follows many attempts to answer these questions, and the correct answers are used for further training.

“How does it get the correct ones? That’s the human feedback,” Arora said. “Some humans have labeled them as correct answers. This is the present pipeline. But in math, you can verify the answer with Lean. So, if you just ask AI to produce its proofs in Lean, labeling the correct answers can be done automatically. Even if it is a long proof, we humans can trust that it is correct.”

Automating proof-checking is one thing, but what about the mathematical questions that are worth being explored? Do they still need a human mathematician? Said Arora, “There is increasing evidence that AI itself can generate very good questions, so therefore you won’t need the large question bank from humans. Maybe the AI will start with some human data, but after that, it takes off and produces new questions. Why can this work? Well, the strongest suit of AI is its creativity. It is trained on this massive amount of data and then can find things across the data and combine them in interesting ways. It is creative, okay, but we know that it can also hallucinate, right? It doesn’t have a very good idea of truth. But it has Lean to verify answers and weed out the wrong questions and answers.”

This is no longer just theory. Experiments have shown the method explained by Arora can work in practice, albeit at less-than-superhuman level. Arora mentioned the examples of DeepMind’s AlphaGeometry and AlphaProof, and the Goedel-Prover created in Arora’s own Princeton Language and Intelligence lab. In 2023, AlphaGeometry solved International Mathematical Olympiad (IMO)-level geometry problems without needing human demonstrations. AlphaProof focused on more general formal math reasoning, not just geometry, and achieved in 2024 the silver-medal standard on IMO problems. This year, models from OpenAI and Google reached the IMO gold-medal standard.

Whereas the models of OpenAI and Google are secret, Arora’s Princeton lab has released an open-source model, called Goedel-Prover-V2. “We verified that you can use current open-source AI models and iterate through 10-to-20 rounds of self-correction to solve five out of six IMO questions, which is at the gold-medal level,” he said.

Another step paving the way for a superhuman AI mathematician came earlier this year from AI startup Morph Labs, which announced a major improvement in AI translating English proofs into Lean. Mathematicians Terence Tao and Alex Kontorovich have been trying to convert the Strong Prime Number Theorem in Lean, which they thought would take a few years of human effort, but with an AI tool called Gauss, it took Morph Labs only three weeks.

“So, this whole pipeline of AI verifying answers and producing its own questions really is taking off,” Arora said. “If AI superintelligence happens, mathematics is the likely first domain because of the verified answers. The reason it can get better is because it is tireless and it can just keep repeating, do this self-improvement loop and get better and better. Just remember that this math superintelligence does not mean it’s perfect, only that it is better than humans.”

Bennie Mols is a science and technology writer based in Amsterdam, the Netherlands.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签