Fortune | FORTUNE 10月10日 17:09
AI在现实任务中表现接近人类专家,且进步迅速
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

一项针对44种职业、1320项专业任务的深入研究显示,当前顶尖的人工智能模型在执行实际工作任务时,其表现已非常接近拥有丰富经验的人类行业专家。研究评估了包括OpenAI、Google Gemini、xAI Grok和Anthropic Claude在内的七个AI模型,其中Claude Opus 4.1表现尤为突出,与人类专家的差距仅在几个百分点。更令人瞩目的是,这些AI模型完成任务的速度和成本均比人类专家低约100倍。同时,研究强调了AI模型正以惊人的速度持续进步,其输出质量与人类相当或更优的比例在短时间内显著增长,预示着AI在未来可能超越人类在多项实际任务上的表现。然而,研究也指出,AI的评估忽略了实际工作场所所需的人工监督、迭代和整合等环节。

🤖 **高度的现实性与专业性**: 该研究深入考察了44种职业的1320项具体任务,并由平均拥有14年经验的专业人士对这些任务进行了验证,确保了研究的真实性和代表性。研究方法确保AI的输出被不知情的专家评审,从而提供了客观的评估。

📈 **AI性能逼近人类专家且效率极高**: 研究发现,包括Claude Opus 4.1在内的顶尖AI模型在完成实际工作任务时,其表现已非常接近人类行业专家。此外,这些AI模型完成任务的速度和成本比人类专家低约100倍,尽管实际应用中仍需考虑人工监督等因素。

🚀 **AI能力的飞速迭代与未来展望**: AI模型的能力正在以惊人的速度增长。例如,OpenAI的模型在任务输出质量方面,与人类相当或更优的比例在短时间内增长了三倍多。若此趋势持续,AI有望在不久的将来在多项实际任务上超越人类,这给商业领袖带来了巨大的挑战,要求他们以更快的速度适应和学习。

For leaders, three points stand out:

The study is highly realistic. It examined 44 occupations and 1,320 specialized tasks required by those occupations. For example: the final testing step in manufacturing a cable spooling truck for underground mining operations. Appropriate professionals (average experience: 14 years) vetted the tasks, all of which are elements of actual work deliverables. Previous research has almost always focused on less realistic tests. The AI results were graded by expert humans who didn’t know if they were looking at work from AI or from an expert human professional.

The best models are already nearly as good as human industry experts. The study examined seven AI models from Open AI, Google’s Gemini, xAI’s Grok, and Anthropic’s Claude. The clear winner was Claude Opus 4.1, which came within a few percentage points of reaching parity with human industry experts. The best models also completed tasks about 100 times faster and 100 times cheaper than the industry experts, though the comparisons ignore “the human oversight, iteration, and integration steps required in real workplace settings,” OpenAI says.

The models are improving at a galloping pace. For example, as OpenAI’s models improved, the percentage of their task outputs that were as good as or better than humans’ outputs more than tripled. If that rate continues—a big if—OpenAI would be better at these real-world tasks than humans overall in a few months. At least some AI competitors could well be on similar trajectories.

The pace of change described in this new research may be the hardest challenge for business leaders. Consider the two-year cycle of Moore’s Law, which changed the world and inspired new corporate giants while dooming others. In retrospect, those were the days. John Chambers, who ran Cisco through the internet frenzy and its crash, said recently that 50% of executives “won’t have the skills to adjust to this new innovation economy driven by AI because they were trained to move at the speed of a five-year cycle as opposed to a 12-month cycle.” His warning to leaders is worth remembering: “With the speed the market is moving at now, you have to be able to reinvent yourself, which most CEOs and business leaders don’t know how to do—especially with AI.”—Geoff Colvin

Contact CEO Daily via Diane Brady at diane.brady@fortune.com

Top news

Israeli government approves Gaza deal as troops pull out

The IDF now has 24 hours to retreat to an agreed-upon line and Hamas has 72 hours to release all Israeli hostages. So far, events are going as planned and the mood is upbeat on both sides. Live coverage from the BBC here.

China places export controls on rare earth minerals

The new rules curb the supply chain for the semiconductors that are used in phones, computers, AI data centers, cars, solar panels, and other IT kit. China has a virtual monopoly in rare earths.

NY Attorney General indicted

Laetitia James is charged with bank fraud and making false statements. The prosecution is part of President Trump’s retribution plan: It was James who secured a $367 million fine against Trump in a civil suit (the fine was later reversed).

Make Argentina Great Again

Yes, the U.S. is bailing out Argentina. Treasury Secretary Scott Bessent confirmed that the Treasury has bought pesos to support the government of Trump ally President Javier Milei. The U.S. is also providing a $20 billion swap line to Argentina. (A swap line allows central banks to exchange fixed amounts of currency on the understanding that the swap will be reversed later and interest will be paid on the repaid currency.)

Moody’s chief economist: roughly half of U.S. states are contracting economically

Moody’s Analytics chief economist Mark Zandi exclusively told Fortune that nearly half of U.S. states are seeing their economies contract—and only 16 are experiencing growth. Zandi also noted that lower-income households are ““hanging on by their fingertips financially…and their world is going into recession pretty quickly.”

KPMG survey identifies quarter where AI sentiment changed

A new KPMG survey of 130 business leaders in companies making more than $1 billion annually found that the adoption of agentic AI technology has quadrupled in the past six months. A principle and aIQ program lead at the company told Fortune that the most recent quarter was when the “fear factor” surrounding the technology faded, leading to what she describes as “cognitive fatigue.”

Google restricts WFH to just 4 days per year

Google’s previous policy was to allow staff to work from anywhere for up to four weeks per year. The new rule says that a single WFH day will now count as an entire week. 

Federal workers will get back pay

U.S. House Speaker Mike Johnson said furloughed federal workers will get the wages they are owed once the shutdown ends.

The markets

S&P 500 futures were up 0.14% this morning. The index closed down 0.28% in its last session. STOXX Europe 600 was flat in early trading. The U.K.’s FTSE 100 was down 0.14% in early trading. Japan’s Nikkei 225 was down 1.01%. China’s CSI 300 was down 1.97%. The South Korea KOSPI was up 1.73%. India’s Nifty 50 was up 0.51% before the end of the session. Bitcoin held at $121.4K.

Around the watercooler

$1.8 trillion deficit revealed during ‘pointless and wasteful government shutdown,’ budget watchdog says by Nick Lichtenberg

Battle over Elon Musk’s trillionaire pay package builds as pension funds face off against Tesla by Amanda Gerut

You’re 10 times more likely to have a flight delay during the government shutdown, Transportation Secretary says: ‘These controllers are stressed out’ by Sydney Lake

California’s ‘impossible’ dream of ending fossil fuels isn’t working, and now it’s looking at price spikes and shortages by Jordan Blum

From WhatsApp friends to a $500 million–plus valuation: These founders argue their tiny AI models are better for customers and the planet by Vivienne Walt

CEO Daily is compiled and edited by Joey Abrams and Jim Edwards.

This is the web version of CEO Daily, a newsletter of must-read global insights from CEOs and industry leaders.

Sign up

to get it delivered free to your inbox.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人工智能 AI性能 行业专家 任务完成 效率提升 AI发展 商业领袖 Artificial Intelligence AI Performance Industry Experts Task Completion Efficiency Improvement AI Development Business Leaders
相关文章