AI领域高强度工作现状与反思

One of the obvious topics of the Valley today is how hard everyone works. We’re inundated with comments on “The Great Lock In”, 996, 997, and now even a snarky 002 (midnight to midnight with a 2 hour break). Plenty of this is performative flexing on social media, but enough of it is real and reflecting how trends are unfolding in the LLM space. I’m affected. My friends are affected.

All of this hard work is downstream of ever increasing pressure to be relevant in the most exciting technology of our generation. This is all reflective of the LLM game changing. The time window to be a player at the most cutting edge is actually a closing window, not just what feels like one. There are many different sizes and types of models that matter, but as the market is now more fleshed out with resources, all of them are facing a constantly rising bar in quality of technical output. People are racing to stay above the rising tide — often damning any hope of life balance.

AI is going down the path that other industries have before, but on steroids. There’s a famous section of the book Apple in China, where the author Patrick McGee describes the programs Apple put in place to save the marriages of engineers traveling so much to China and working incredible hours. In an interview on ChinaTalk, McGee added “Never mind the divorces, you need to look at the deaths.” This is a grim reality that is surely playing out in AI.

The Wall Street Journal recently published a piece on how AI Workers Are Putting In 100-Hour Workweeks to Win the New Tech Arms Race. The opening of the article is excellent to capture how the last year or two has felt if you’re participating in the dance:

Josh Batson no longer has time for social media. The AI researcher’s only comparable dopamine hit these days is on Anthropic’s Slack workplace-messaging channels, where he explores chatter about colleagues’ theories and experiments on large language models and architecture.

Work addicts abound in AI. I often count myself, but take a lot of effort to make it such that work expands to fill available time and not that I fill everything in around work. This WSJ article had a bunch of crazy comments that show the mental limits of individuals and the culture they act in, such as:

Several top researchers compared the circumstances to war.

Comparing current AI research to war is out of touch (especially with the grounding of actual wars happening simultaneously to the AI race!). What they really are learning is that pursuing an activity in a collective environment at an elite level over multiple years is incredibly hard. It is! War is that and more.

In the last few months I’ve been making an increasing number of analogies to how working at the sharp end of LLMs today is similar to training with a team to be elite athletes. The goals are far out and often singular, there are incredibly fine margins between success and failure, much of the grinding feels over tiny tasks that add up over time but you don’t want to do in the moment, and you can never quite know how well your process is working until you compare your outputs with your top competition, which only happens a few times a year in both sports and language modeling.

In college I was a D1 lightweight rower at Cornell University. I walked onto a team and we ended up winning 3 championships in 4 years. Much of this was happenstance, as much greatness is, but it’s a crucial example in understanding how similar mentalities can apply in different domains across a life. My mindset around the LLM work I do today feels incredibly similar — complete focus and buy in — but I don’t think I’ve yet found a work environment where the culture is as cohesive as athletics. Where OpenAI’s culture is often described as culty, there are often many signs that the core team members there absolutely love it, even if they’re working 996, 997, or 002. When you love it, it doesn’t feel like work. This is the same as why training 20 hours a week while a full time student can feel easy.

Many AI researchers can learn from athletics and appreciate the value of rest. Your mental acuity can drop off faster than your physical peak performance does when not rested. Working too hard forces you to take narrower and less creative approaches. The deeper into the hole of burnout I get in trying to make you the next Olmo model, the worse my writing gets. My ability to spot technical dead ends goes with it. If the intellectual payoffs to rest are hard to see, your schedule doesn’t have the space for creativity and insight.

Crafting the team culture in both of these environments is incredibly difficult. It’s the quality of the team culture that determines the outcome more than the individual components. Yes, with LLMs you can take brief shortcuts by hiring talent with years of experience from another frontier lab, but that doesn’t change the long-term dynamic. Yes, you obviously need as much compute as you can get. At the same time, culture is incredibly fickle. It’s easier to lose than it is to build.

Some argue that starting a new lab today can be an advantage against the established labs because you get to start from scratch with a cleaner codebase, but this is cope. Three core ingredients of training: Internal tools (recipes, code-bases, etc.), resources (compute, data), and personnel. Leadership sets the direction and culture, where management executes with this direction. All elements are crucial and cannot be overlooked. The further along the best models get, the harder starting from scratch is going to become. Eventually, this dynamic will shift back in favor of starting from scratch, because public knowhow and tooling will catch up, but in the meantime the closed tools are getting better at a far faster rate than the fully open tools.

The likes of SSI, Thinky, and Reflection1 are likely the last efforts that are capitalized enough to maybe catch up in the near term, but the odds are not on their side. Getting infinite compute into a new company is meaningless if you don’t already have your code, data, and pretraining architectures ready. Eventually the clock will run out for company plans to be just catching up to the frontier, and then figure it out from there. The more these companies raise, the more the expectations on their first output will increase as well. It’s not an enviable position, but it’s certainly ambitious.

In many ways I see the culture of Chinese technology companies (and education systems) as being better suited for this sort of catch up work. Many top AI researchers trained in the US want to work on a masterpiece, where what it takes in language modeling is often extended grinding to stabilize and replicate something that you know definitely can work.

I used to think that the AI bubble would pop financially, as seen through a series of economic mergers, acquisitions, and similar deals. I’m shifting to see more limitations on the human capital than the financial capital thrown at today’s AI companies. As the technical standard of relevance increases (i.e. how good the models people want to use are, or the best open model of a given size category), it simply takes more focused work to get a model there. This work is hard to cheat in time.

This all relates to how I, and other researchers, always comment on the low hanging fruit we see to keep improving the models. As the models have gotten better, our systems to build them have gotten more refined, complex, intricate, and numerically sensitive.2 While I see a similar amount of low-hanging fruit today as I did a year ago, the efforts (or physical resources, GPUs) it can take to unlock them have increased. This pushes people to keep going one step closer to their limits. This is piling on to more burnout. This is also why the WSJ reported that top researchers “said repeatedly that they work long hours by choice.” The best feel like they need to do this work or they’ll fall behind. It’s running one more experiment, running one more vibe test, reviewing one more colleague’s PR, reading one more paper, chasing down one more data contract. The to-do list is never empty.

The amount of context that you need to keep in your brain to perform well in many LM training contexts is ever increasing. For example, leading post-training pipelines around the launch of ChatGPT looked like two or maybe three well separated training stages. Now there are tons of checkpoints flying around getting merged, sequenced, and chopped apart in part of the final project. Processes that used to be managed by one or two people now have teams coordinating many data and algorithmic efforts that are trying to land in just a few models a year. I’ve personally transitioned from a normal researcher to something like a tech lead who is always trying to predict blockers before they come up (at any point in the post-training process) and get resources to fix them. I bounce in and out of problems to wherever the most risk is.

Cramming and keeping technical context pushes out hobbies and peace of mind.

Training general language models you hope others will adopt — via open weights or API — is becoming very much an all-in or all-out domain. Half-assing it is becoming an expensive way to make a model that no one will use. This wasn’t the case two years ago, where playing around with a certain part of the pipeline was legitimately impactful.

Culture is a fine line between performance and toxicity, and it’s often hard to know which you are until you get to a major deliverable to check in versus competitors.

Personally, I’m fighting off a double-edged sword of this. I feel immense responsibility to make all the future Olmo models of the world great, while simultaneously trying to do a substantial amount of ecosystem work to create an informed discussion around the state of open models. My goal around this discussion is for more real things to be built. ATOM Project is a manifestation of me feeling that both the U.S. ecosystem generally and the Olmo project are falling behind.

It doesn’t really seem like there will be an immediate fix or end goal at this, but looking back I’m sure it’ll be clear what the key moments were and whether or not my efforts here and elsewhere met my goals.

Will it all be worth it? How long do you plan to go on like this? It’s not like we’re really going to suddenly reach AGI and then all pack it up and go home. AI progress is a long-haul now.

For me, the only reason to keep going is to try and make AI a wonderful technology for the world. Some feel the same. Others are going because they’re locked in on a path to generational wealth. Plenty don’t have either of these alignments, and the wall of effort comes sooner.

Thanks to Ross Taylor, Jordan Schneider, and Jasmine Sun for feedback on this post.

Starting later is obviously far more challenging.

For example, the more GPUs you train on, and the bigger the AI model, you encounter more types of errors across the massive system.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签