https://www.seangoedecke.com/rss.xml 10月02日
人机协作与LLM使用效率
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

当前AI编程模式多为“人机协作”,如“ centaure chess”,其中熟练人类与计算机助手结合,产出效率远超个体。LLM虽快但缺乏深度判断,高反馈工具如Claude Code需实时监督,而Devin等自主工具效率较低。瓶颈在于LLM而非人类,因其工作强度大且资源集中,导致高峰时段性能下降。AI实验室可能通过量化模型降低成本,但高峰时段工具稳定性差。美国科技公司为利用非高峰时段资源,倾向于雇佣澳大利亚和欧洲工程师,实现全天候工作。对于常规大科技公司工作,LLM辅助非常有益,但需注意特定领域人类经验的优势。

🤝人机协作模式中,人类与AI助手结合能显著提升工作产出,弥补个体局限。人类提供深度判断,AI负责高效执行,形成互补。

📊LLM虽在速度和一致性上优于人类,但在判断深度上存在不足。高反馈工具如Claude Code需实时监督,而自主工具如Devin效率较低,反映了当前瓶颈。

⏰LLM资源集中且工作强度大,高峰时段性能下降。AI实验室可能通过量化模型降低成本,但这会牺牲部分性能,导致高峰时段工具稳定性差。

🌏美国科技公司为利用非高峰时段资源,倾向于雇佣澳大利亚和欧洲工程师,实现全天候工作。此举能提高资源利用效率,并减少美国工作时段时间的故障和扩展问题。

💡对于常规大科技公司工作,LLM辅助非常有益,能显著提升效率。但在特定领域,人类经验的优势不可替代,需注意平衡AI与人类的作用。

Right now the dominant programing model is something like “centaur chess”, where a skilled human is paired with a computer assistant. Together, they produce more work than either could individually. No individual human can work as fast or as consistently as a LLM, but LLMs lack the depth of judgement that good engineers do1. That’s why the current state-of-the-art AI programming tools are all high-feedback tools like Claude Code where you can see every step the agent is taking and provide real-time feedback, not tools like Devin where it just goes away and solves the problem on its own.

Even though human judgment is required, the bottleneck for these pairings is the LLM, not the human. Since the human engineer is just supervising the work, the LLM is generally working much harder. And while humans live in lots of different places, the top LLMs are typically located in a handful of datacenters. Because the weights are secret, you can’t just run Claude Sonnet 4 on your personal compute - you have to go to Anthropic or Amazon, who are struggling to run the model at the scale required to meet demand.

This dynamic has led to a popular theory that AI labs react to peak traffic (US working hours) by quantizing2 their models so they’re cheaper to serve at scale. According to this theory, if you use Claude Code in the middle of the night, you will get a smarter model than if you use it in the middle of the USA working day. For what it’s worth, Anthropic has categorically denied that they do this. But regardless, it’s just a fact that AI tooling is less reliable during peak hours. Requests are more likely to time out or just go down entirely.

American tech companies can’t reliably buy more LLM compute. You can give OpenAI as much money as you want - if there’s too much traffic for their GPUs during peak hours, they won’t be able to serve your requests. However, tech companies can buy the ability to do useful work outside of peak hours, by hiring engineers whose working hours are off-peak.

In other words, the economics of serving LLMs provides a real incentive for American tech companies to hire engineers from Australia and Europe. Those engineers can make more efficient use of a scarce compute resource, and will be insulated from the inevitable outages and scaling issues during the American working day.

I wrote about this before in What it’s like working for American companies as an Australian. I do think hiring in Australia can be kind of like a superpower for American tech companies: even aside from the LLM point, it allows work to continue around the clock on high-priority tasks. When something really has to launch in two days, there’s a big difference between being able to spend ~20 hours of engineering time on it and being able to spend ~48 hours of engineering time on it. When a large American customer reports a nasty bug, being able to fix it overnight without making an engineer work outside their normal hours is very nice.

Of course I’m writing this partially out of self-interest, as an Australian software engineer who likes working for American software companies. But I also think it’s true! If you see your tech company’s engineering work as a partnership with AI coding agents, consider hiring an engineering staff who can continue that partnership into the night, when the GPUs are running cool and the models aren’t potentially being quantized.


  1. Of course this isn’t true for all domains. If you’ve been working in the same codebase for ten years, you will be much better at it than any AI system. I wrote about this in METR’S AI productivity study is really good and in Pure and impure engineering. But for most ordinary big-tech work, where engineers are forced to produce large amounts of impure code to a deadline, LLM assistance can be very helpful indeed.

  2. Quantizing a model means changing the precision of its weights (e.g. instead of “0.2156”, you store the weight as “0.2”, which makes the inference calculations much cheaper to run). It always makes the model a bit dumber, though, so it’s a tradeoff.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人机协作 LLM效率 高峰时段 资源利用 工程效率
相关文章