Sam Patterson's Blog 10月02日
AI生产力研究设计局限
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

当前关于AI生产力效应的研究,如Dell’Acqua等人(2023)和Chen与Chan(2024)的工作,存在多模态能力缺失、缺乏上下文支持、限制聊天机器人独立行动或调用外部API以及缺乏人机实时协作工作空间等缺陷。这些设计上的人为约束阻碍了真实场景下AI生产力效益的评估,导致许多对AI的质疑基于过时模型而非最新进展,同时忽视了对提示工程和专用工具的正确使用方法。

📊 许多AI生产力研究存在设计缺陷:现有文献在评估AI生产力时,通过随机化访问LLM聊天机器人的方式,但缺乏多模态交互能力、上下文支持、独立行动权限或API调用功能,也缺少人机实时协作的工作空间,这些人为限制导致研究环境与实际应用场景脱节。

🔄 AI怀疑论者的常见误区:许多批评者基于过时的GPT-3.5模型而非GPT-4提出质疑,忽视模型能力随推理能力发展而显著提升的事实,例如数学推理、图像生成等方面已有突破性进展,但质疑者常引用旧模型的表现来否定技术进步。

🎯 目标转移策略:批评者习惯于不断变更评判标准,当AI在模仿人类对话、高级数学计算、生成逼真图像等方面取得进展后,他们会迅速提出新的挑战如生成代码或排查错误,形成'先定义失败再等待成功'的循环论证。

🗣️ 提示工程认知不足:许多质疑者未掌握有效的提示方法,如同要求新员工处理复杂任务却未提供必要背景信息,他们往往只给出单一简单指令后就断定模型失败,而忽略了构建完整对话流程和提供充足上下文的重要性。

🔧 工具使用局限性:部分开发者从未使用过如Cursor、Claude Code等智能编码助手,仅凭手动网页界面交互和代码复制粘贴的经验形成负面评价,而忽视了这些工具在代码生成、调试等方面的实际价值。

🧐 正确的怀疑态度:作者强调有依据的怀疑是必要的,但应建立在使用最新模型、掌握提示技巧、尝试先进工具的基础上,真正的AI怀疑者需通过实践而非主观偏见来形成结论,同时保持对技术发展的开放心态。

This quote from a recent paper caught my attention:

First, while the current literature, such as Dell’Acqua et al. (2023) and Chen and Chan (2024), reveal the productivity effects of AI by randomizing access to LLM chatbots, they are not multimodal, do not include context, do not allow the chatbots to take independent actions or use APIs to call outside of the platform, and do not provide a collaborative workspace where machines and humans can jointly manipulate output artifacts in real-time.

In other words, measuring the productivity gains from using AI has been hampered because of the artificial constraints of the study design. They’re not giving their users access and the environment in which they’d actually use the models for real.

This got me thinking about some of the AI skeptics I’ve encountered over the past few years.

AI Skeptics

I don’t like hype. I got into the world of cryptocurrency when Bitcoin was really the only game in town, and I’ve been exposed to more hype than any man should have to witness.

I know many others like me, especially in tech. They’ve been there, done that, and whether or not the bought the t-shirt, they have well-tuned bullshit detectors.

Or so they think.

The truth is, none of us knows the future. Yes, we can spot hype, but can we know for sure just how delusional the hype is this time around? No. At least, not without proper time investment to see for ourselves.

I’ve seen many examples of otherwise thoughtful people correctly seeing hype and then incorrectly dismissing the underlying technology because of the mere existence of the hype. The truth they don’t want to see is that their uninformed dismissal is just as naive as the uninformed hype.

A few times I’ve seen people make overly dismissive claims about AI (usually in HN or Reddit comment threads), and the responses are often the same: “I’m not uninformed! I’ve used the models and they didn’t work for me.”

Isn’t this a valid response? Of course! Their own experiences are far more valuable than taking someone else’s word about the models’ capabilities.

Yet… these responses were often very far from my own experiences, which made me curious as to why. So I would follow up with questions, and I discovered some commonalities.

They aren’t using the latest models

Remember the ancient days of GPT-3.5? This was the first mainstream model, and it sparked many of these original conversations about the models’ capabilities. It was amazing that a computer could hold a conversation at all, but of course the model had serious limitations.

Remember the less ancient days of the GPT-4 release? So do I - it was a major improvement from 3.5, one that kept me feeling excited about where this was heading.

The hype from 3.5 didn’t die down when 4 released - instead it grew. This unleashed a wave of AI skeptics who pointed out the limitations of the models at every opportunity.

The problem I noticed was that their objections were almost entirely based on 3.5 and not 4. They would post their prompt and response, then point and laugh. I would ask, “Was this 3.5 or 4?” and I estimate 90% of the time it was the older model. I would rerun the prompt with 4, and of course the output was dramatically improved.

This still happens, frequently. When the models’ capabilities upgraded after the transition to reasoning occurred, many examples of poor mathematical reasoning were still trotted out. Image generation flaws are laughed at, but aren’t done with SOTA models. Just this week OpenAI’s 4o image tool rolled out and seems to have just about solved text generation in images (see the images in this post) - I guarantee you many will continue to say image generation models will never work for certain applications because they can’t handle text properly.

They move goalposts

Of course AI models can imitate human conversation - that isn’t really impressive. Of course AI models can do advanced math - so what? Of course they can make photorealistic images - that problem wasn’t that hard. Of course they can generate boilerplate code - they’re just ingesting basic documentation. Of course they can troubleshoot bugs - they’re just ingesting Stack Overflow. Of course…

You can’t transport your mind back to 2020, but if you could, I’m nearly 100% convinced that you would be absolutely astonished by what AI can now do. We passed the Turing Test ages ago, and few people cared.

“The models can do X, but they can’t do Y.” They then do Y. “They can do Y, but really Z is the thing humans need and models can’t do.” Over and over again.

They don’t know how to prompt

I almost didn’t include this observation, because if a model requires you to become an expert in prompting it in order for it to be useful, then that’s a valid objection. This is true - the more time you put into using the models, the better you get at understanding how to prompt them, and the better the responses get.

I am including it, not because the objection isn’t valid, but because so many skeptics are lacking basic awareness of how important prompting is. They’re doing the equivalent of asking a new intern on their team to handle a complex task without giving them the context they need in a completely new environment.

When I see the prompts they use, I asked them what the rest of the context looks like. All too often, that was it. Then I ask them for the follow up prompts. Nope, that was it - they saw the model failed, and they stopped there.

This just isn’t how you use the models. Well, maybe for simple requests. But if you want them to do something complex, you need to be more thoughtful about what information you’re giving, what instructions you’re giving, and how to guide the model throughout a conversation.

If you’re looking for confirmation that AI can’t do something, you’ll find it. It takes a bit more effort to understand how they can be genuinely useful.

They don’t use tools

This one is becoming less true over time - I’ve seen a lot of people say they’ve tried Cursor, for example, and didn’t like it. Kudos to you for trying a new tool.

However it still boggles my mind how many developers have never used any type of agentic coding assistant, and their opinions are formed based on prompting manually through a webUI and copying and pasting code.

I get it - it’s what you’re comfortable with. And it does help you. But if you’ve never tried Claude Code or Cursor or Aider or the other tools, please give them a try. Moving away from needing to copy and paste is already a huge improvement, but these tools do way more than that now.

If you couple these tools along with learning good prompting and having persistence, they quickly become indispensable, at least for greenfield projects.

Conclusion

Skepticism is good, but informed skepticism is better. If you’re using the SOTA models, you’ve used models enough to know how to prompt and guide them, you’re trying out the latest tools, and you still believe that AI isn’t all that useful - great! You have an informed opinion.

As an AI optimist, I’ve slowly learned the ways in which I was overly-optimistic about what AI could do, or the timelines involved. All I hope to see is the same genuine attempt at learning from the AI skeptics.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI生产力 LLM研究 AI怀疑论 提示工程 AI工具 技术局限 设计缺陷
相关文章