https://eugeneyan.com/rss 09月30日
机器学习会议反思
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

机器学习会议信息量大,学习节奏快,需要及时反思和记录。本文总结了2024年ML会议的要点,包括构建有效机器学习系统的关键点、生产与规模化挑战、执行与协作技巧、以及用户导向设计等。从定义奖励函数到平衡技术取舍,再到设计数据飞轮,文章强调了实践中的核心原则。同时,也探讨了模型迭代、团队协作和快速反馈的重要性,为ML系统建设提供了实用指导。

构建有效机器学习系统需关注定义奖励函数、将世界转化为数据、找到关键杠杆并衡量重要指标,警惕ML看似简单的表象。

奖励函数设计是关键,短期奖励需谨慎,长期奖励难以归因,代理奖励可能不相关,稀疏反馈需早期投入。

并非所有问题都需要机器学习,启发式方法和SQL查询可作为基线,先简单尝试再决定是否使用ML。

机器学习涉及多维度权衡,如召回率与精确率、探索与利用、相关性、速度与成本,需根据用户需求找到平衡点。

设定合理预期,预测人类行为的问题(如搜索、推荐)存在上限,除非进行基础研究,否则不必追求超越极限。

时间维度不可忽视,用户偏好、库存、内容相关性会随时间变化,需确保系统具备时序感知能力。

评估框架是核心竞争力,强大的评估能力有助于持续推出可靠体验,团队应早期投入建设评估体系。

设计数据飞轮至关重要,数据本身不是优势,通过用户反馈持续改进模型、系统或产品才能形成竞争优势。

使用LLM需警惕Brandolini定律,生成劣质内容易,评估和规避缺陷难,后者才是赢得客户信任的关键。

未来可能没有单一通用模型,产品需支持多个模型协同工作,主模型可调度多个子模型完成特定任务。

Industry ML conferences are intense. There’s so much information, learning, and context switching between talks and posters and hallway conversations that leaves you exhausted each day. Thus, whenever there’s a break, taking a few minutes to reflect and take notes helps to solidify the learning. Here are my notes from ML conferences in 2024.

(I also had the opportunity to share my work at a few of these conferences. Here are the slides for my talks at the Netflix PRS Workshop and the AI Engineer World’s Fair. Unfortunately, my oral presentation at the Amazon ML Conference is internal only.)

• • •

Building effective machine learning systems

1. The real world is messy. To build systems that work, we need to define reward functions (that define labels), operationalize the world as data, find levers that make a difference, and measure what matters. Beware of those who tell you ML is a walk in the park.

2. Getting the reward function right is half the battle won. Short-term rewards don’t convey the full picture. Long-term rewards are delayed and tricky to attribute. Proxy rewards are finicky and may not correlate well with business metrics. And most rewards are sparse, with feedback few and far between. Invest early in reward function engineering.

3. You don’t always need machine learning. Heuristics and SQL queries are valuable baselines. Start simple and see if the juice is worth the squeeze. Remember the first rule of machine learning: Start without machine learning.

4. Machine learning involves trade-offs. Recall vs. precision. Explore vs. exploit. Relevance vs. diversity vs. serendipity. Accuracy vs. speed vs. cost. The challenge is figuring out the right balance for your user experience.

5. Set realistic expectations. Most problems have a ceiling on what can be achieved, especially those that involve predicting the behavior of unpredictable humans (e.g., search, recommendations, fraud). It may not make sense to aim beyond the ceiling, unless you’re doing core research to push the boundaries on what’s possible.

6. Don’t overlook the dimension of time. User preferences change. Inventory gets drawn down. Content relevance shifts. Daily, seasonally, over months and years. If time is a key factor in your problem (e.g., recommendations, search, news feed), ensure your systems and models are time-aware.

7. Evals are a differentiator and moat. Over the past two years, teams with solid evals have been able to continuously ship reliable, delightful experiences. No one regrets investing in a robust evaluation framework.

8. Design with the data flywheel in mind. Data alone is not the competitive advantage; it is the data flywheel. How will you gather user feedback to enhance your model, system, or product and fuel a better customer experience? (See this Tesla self-driving example from Andrej Karpathy.) Whoever turns the data flywheel faster, wins.

9. Brandolini’s law: The amount of energy needed to refute bullshit is an order of magnitude larger than needed to produce it. The same applies to using LLMs. Generating slop content is easy relative to evaluating and guardrailing the defects. But the latter is how we earn—and keep—customer trust. Invest your efforts accordingly.

10. We probably won’t have one model to rule them all. Instead, each product will likely have several models supporting it. Maybe a bigger model orchestrating several smaller models. This way, each smaller model can give their undivided attention to their task.

11. Altman’s law: When a new model drops, are you nervous, or are you super pumped? Consciously design your product so that you’re in the latter camp. Be prepared to swap the model anytime and reap the benefits of constantly improving models. The model isn’t your product—the system around it is.

12. Build with an eye toward the future. Flexibility beats specialization in the long run. Remember The Bitter Lesson. An LLM that’s a bit worse now will likely outperform a custom finetune later, especially as LLMs get cheaper (two orders of magnitude in 18 months!), faster, and more capable. Other examples include using a generative/extractive approach vs. named entity recognition/classification, and explore-exploit/reinforcement learning vs. supervised learning.

Production and scaling

13. Don’t underestimate the effort it takes to go from demo to production. “There’s a large class of problems that are easy to imagine and build demos for, but extremely hard to make products out of. For example, self-driving. It’s easy to demo a car self-driving around a block but making it into a product takes a decade.” — Andrej Karpathy

14. Scale makes everything harder. While we may not need to face it head-on from day one, we should be ready for it. Especially if we have to support multiple existing markets, languages, devices, user experiences, etc.

15. Each 10x-ing of scale/traffic will uncover new bugs and issues. Prepare early for the operational challenges that come with scale and just keeping the lights on.

16. Depending on your product, LLMs are cheap or expensive. “Even the most expensive LLMs are not that expensive for B2B scale; even the cheapest LLMs are not that cheap for consumer scale.” — Will Larson. If you’re mostly B2B or running internal workflows, using LLM APIs could be viable long-term. Address scale when you need to, not prematurely.

17. [Corollary] The economics aren’t in the way; it’s trust, reliability, security, etc. Costs will keep dropping. But faithfulness, hallucinations, prompt injections, etc. are still open problems. They’re the bottleneck to wider integration of LLMs in user-facing experiences.

18. Get the fundamentals right. Robust data pipelines, instrumentation, evaluation, guardrails, experimentation, monitoring, metrics. These core capabilities are essential—and reusable—across most products. Investing in them early will save you the development and operational cost associated with each new product.

19. Start simple, always. Complexity will creep in as we patch edge cases and extend an existing system for new requirements and features. Thus, a system that starts complex will inevitably buckle under its own weight or become an operational nightmare.

20. Not everything needs to be in real-time. If your user experience allows it, consider batch or asynchronous workflows to simplify the system design and reduce operational costs. When designing a new system, ask yourself: Can this be async?

21. Design for fast failure detection and recovery. No system is perfect; stuff will slip through. Yet, there’s an optimal investment balance between prevention and cure. Monitoring, alerts, rollbacks, Andon cords—these will go a long way.

Execution and collaboration

22. Execution is everything. Execution is navigating from where we are today (e.g., legacy systems, low velocity, high opex) to our long-term vision. It’s everything from idea to design to implementation to launch to measurement to operations and everything else in between. Executing well is the difference between success and failure.

23. Your rate of iteration = your rate of innovation. Focus on experimenting fast, getting rapid feedback, and updating or pivoting quickly. Velocity matters.

24. [Counterpoint] Breakthroughs will take longer than you think. Nine women can’t give birth to a baby in a month, and we can’t rush game-changing research and innovation. Start early, be patient, and keep pushing.

25. Not every challenge is technical. Some challenges are about alignment, culture, and organizations. Working with people is hard. When you zoom out, it turns out that tech is often the easier part.

26. It takes a village to raise a machine learning system. Infra, engineering, data, ML, design, product, business, and more. No role is more important than the others. 1 + 1 =3.

27. Genius can come from anywhere. Not just the ivory towers of research. Some of the best ideas come from people who deeply understand the customer, regardless of whether their role is technical, creative, or operational. Don’t overlook the wisdom of the crowd.

28. You don’t have to go it alone. Whatever you’re working on, there will be others tackling similar problems. Reach out to them to learn and/or collaborate, either within your organization or on the internet. Together, we’re stronger.

29. People want to help. Especially if you’ve done your homework, bring data, and keep an open mind. Everyone, and I mean everyone, at these conferences is incredibly kind and generous. I experienced this warmth firsthand interacting with some of the leaders in the field. The strongest are also the kindest.

30. What seems obvious to you can be game-changing for others. Expertise is a ladder. Wherever you’re at, there are people a few rungs below (and a few rungs above) who are eager to learn from you. Thus, share what you know, even if you think it’s too basic and not worth sharing. It could help someone with a challenge they’re facing.

31. Tune out the noise; focus on building. Don’t get nerd-sniped by the daily barrage of shiny new techniques. Most don’t pan out anyway. For real alpha on what actually works, have hallway conversations and DMs with the practitioners who are quietly crushing it.

Building for users

32. Always work backwards from the customer. Why are we solving this problem? How does it help the customer? What are the tangible and intangible benefits? Unless you’re a researcher, don’t do science for the sake of science. Focus on the customer.

33. To create winning products, dream big and sweat the small stuff. Aim for the stars but don’t forget the details. The best leaders have vision and get their hands dirty with the details. You can’t just do one or the other; you need to do both.

34. Humans are insatiable. LLMs will automate some tasks. LLMs will simplify others. But there will always be new problems to solve, and more things to build. AI won’t steal your job—it’ll just make it more interesting.

Speaking at and attending conferences

35. Speaking at industry conferences is a tightrope act. It’s a delicate balance between sharing valuable insights and protecting the secret sauce. You’ll see this when speakers rely on public data, reference published work and public tech blogs, stick to high-level ideas, and choose their words carefully. (This is also why many of the best practitioners decline to speak.) Nonetheless, everyone understands.

36. GOATs are GOATs because they work hard. I saw many diligently taking notes, pulling up the papers being presented to review the tables and charts, and following up on points they didn’t understand. This is why they’re the best.

37. Conference insiders keep it friendly during public Q&A. The hard questions and debates are reserved for hallway conversations, happy hours, and over dinner.

38. No one really cares how good your slides look. Fancy layouts? Dark mode? Not a priority. Never has been, from what I can tell. Some of the top speakers just use the default font on white backgrounds. Substance over style, always.

39. Get enough geeks together and magic happens. The energy, the inspiration, the ideas. It’s infectious. I leave every conference on a high, fired up by thoughts on what to explore next, motivated by the conversations, and ready to build. Highly recommend attending a good conference at least once a year.

Similar reading

If you found this useful, please cite this write-up as:

Yan, Ziyou. (Nov 2024). 39 Lessons on Building ML Systems, Scaling, Execution, and More. eugeneyan.com. https://eugeneyan.com/writing/conf-lessons/.

or

@article{yan2024conflessons,  title   = {39 Lessons on Building ML Systems, Scaling, Execution, and More},  author  = {Yan, Ziyou},  journal = {eugeneyan.com},  year    = {2024},  month   = {Nov},  url     = {https://eugeneyan.com/writing/conf-lessons/}}
Share on:

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

机器学习 系统构建 评估框架 数据飞轮 LLM应用 生产规模 团队协作 用户导向
相关文章