少点错误 08月12日
Thoughts on extrapolating time horizons
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了人工智能(AI)的实际进展,通过衡量AI代理完成任务的时间跨度(time horizon)这一关键指标。数据显示,自2019年以来,AI的能力增长呈指数级,时间跨度每7个月翻倍,而自2024年起更是加速至每4个月翻倍。作者推测,若按近期速度发展,AI系统到2027年底即可胜任长达一个月的低上下文软件工程工作;即使按历史平均速度,也将在2029年底实现。文章还指出,AI在自动化AI研究方面的潜力可能进一步加速其发展,尽管计算资源瓶颈可能在2030年左右带来影响。作者预测,到2028年底,AI研究将实现自动化,到2029年底,AI在超过95%的当前智力劳动中将超越人类。

🤖 AI能力衡量新视角:文章引入“时间跨度”(time horizon)作为衡量AI代理完成任务长度的关键指标,该指标自2019年以来持续翻倍,反映了AI能力的快速进步。

🚀 进展加速趋势:数据显示,AI完成任务的时间跨度从2019年的每7个月翻倍,加速到2024年后的每4个月翻倍,表明AI发展呈现加速态势。

💼 月度任务能力预测:基于近期发展速度,作者预测AI系统到2027年底有望完成长达一个月的低上下文软件工程工作;若按历史平均速度,则需到2029年底。

💡 未来发展与瓶颈:文章认为AI自动化AI研究将进一步加速其发展,但同时指出计算资源瓶颈可能在2030年左右限制AI的规模化发展。

📈 十年发展蓝图:作者预测AI研究将在2028年底实现自动化,并在2029年底在超过95%的当前智力劳动中超越人类,描绘了AI在未来十年内的发展蓝图。

Published on August 11, 2025 10:36 PM GMT

(written for a Twitter audience)

Has AI progress slowed down? I’ll write some personal takes and predictions in this post.

The main metric I look at is METR’s time horizon, which measures the length of tasks agents can perform. It has been doubling for more than 6 years now, and might have sped up recently.

By measuring the length of tasks AI agents can complete, we can get a continuous metric of AI capabilities.

Since 2019, the time horizon has been doubling every 7 months. But since 2024, it’s been doubling every 4 months. What if we irresponsibly extrapolated these to 2030?

If AI progress continues at its recent rate, we get AI systems which can do one month (167 hours) of low-context SWE work by the end of 2027. If AI progress continues at the long-run historical rate, we get them by the end of 2029 instead.

How to interpret one work-month? I’d say it’s something like the first project a new hire would do, or the type of work a researcher who just switched teams would be able to do in a month. Our time horizon metric currently doesn’t define high time horizons super sharply.

Changing the success rate threshold from 50% to 80% only shifts the extrapolation from recent progress by a few months, but shifts the extrapolation from the long-run historical rate by around a year.

I don’t think these lines should be extrapolated much past one work-month, as progress will likely speed up even more once AIs are automating significant parts of AI research. Additionally, bottlenecks identified by Epoch AI might slow down compute scaling around 2030. https://epoch.ai/blog/can-ai-scaling-continue-through-2030

Our task suite is currently composed of well-scoped easily-scoreable tasks, which makes them pretty different from the type of work done in the real world. This means that we should be cautious when interpreting these extrapolations.

My best guess is that future models will be a closer fit to the extrapolation from recent progress than the extrapolation from long-run progress. But even the more conservative trend implies that AIs will be doing month-long tasks by the end of the decade.

More concretely, my median is that AI research will be automated by the end of 2028, and AI will be better than humans at >95% of current intellectual labor by the end of 2029.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人工智能 AI进展 AI预测 时间跨度 自动化
相关文章