ThursdAI - Recaps of the most high signal AI weekly spaces 09月25日
Kimi K2 AI模型发布
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本周OpenAI发布ChatGPT Agent,Moonshot推出1万亿参数的Kimi K2模型,后者采用MOE架构,SWE-bench得分65.8%,支持128K上下文窗口,使用新型Muon优化器训练。Weights & Biases与CoreWeave合作提供Kimi K2推理服务。Grok加入动漫角色,Gemini Embedding Model表现优异。Liquid AI推出LEAP和Apollo实现设备端AI,Amazon Kiro提供需求驱动开发。

🔍 Kimi K2是Moonshot发布的一个1万亿参数的开源AI模型,采用Mixture of Experts(MOE)架构,仅32B活跃参数,SWE-bench得分65.8%,支持128K上下文窗口,使用新型Muon优化器训练,无需推理即可达到高性能。

🚀 Weights & Biases与CoreWeave合作,在模型发布后3天内提供Kimi K2的推理服务,用户可通过api.inference.wandb.ai/v1访问,并可获得$50推理信用额度用于测试。

🎮 Grok AI增加了动漫角色,包括名为Annie的3D互动角色,由XAI公司提供技术支持,导致Grok应用在日本App Store排名上升,并发布了“全栈虚拟伴侣工程师”职位。

📈 Gemini Embedding Model在MTEB排行榜上取得68.32分的优异成绩,成为最新的SOTA模型,而Amazon S3 Vectors推出原生向量存储功能,对RAG应用意义重大。

📱 Liquid AI的LEAP和Apollo平台支持在设备端运行小型LLM,无需云连接,50-300MB的模型可在Android和iOS设备上实现低功耗即时推理,适用于离线工作场景。

Hey everyone, Alex here 👋 and WHAT a week to turn a year older! Not only did I get to celebrate my birthday with 30,000+ of you live during the OpenAI stream, but we also witnessed what might be the biggest open-source AI release since DeepSeek dropped. Buckle up, because we're diving into a trillion-parameter behemoth, agentic capabilities that'll make your head spin, and somehow Elon Musk decided Grok waifus are the solution to... something.

This was one of those weeks where I kept checking if I was dreaming. Remember when DeepSeek dropped and we all lost our minds? Well, buckle up because Moonshot's Kimi K2 just made that look like a warm-up act. And that's not even the wildest part of this week!

As always, all the show notes and links are at the bottom, here's our liveshow (which included the full OAI ChatGPT agents watch party) - Let's get into it!

ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

🚀 Open Source LLMs: The Kimi K2 Revolution

The New Open Source King Has Arrived

Folks, I need you to understand something - just a little after we finished streaming last week celebrating Grok 4, a company called Moonshot decided to casually drop what might be the most significant open source release since... well, maybe ever?

Kimi K2 is a 1 trillion parameter model. Yes, you read that right - TRILLION. Not billion. And before you ask "but can my GPU run it?" - this is an MOE (Mixture of Experts) with only 32B active parameters, which means it's actually usable while being absolutely massive.

Let me give you the numbers that made my jaw drop:

The main thing about the SWE-bench score is not even just the incredible performance, it's the performance without thinking/reasoning + price!

The Muon Magic

Here's where it gets really interesting for the ML nerds among us. These folks didn't use AdamW - they used a new optimizer called Muon (with their own Muon Clip variant). Why does this matter? They trained to 15.5 trillion tokens with ZERO loss spikes. That beautiful loss curve had everyone in our community slack channels going absolutely wild.

As Yam explained during the show, claiming you have a better optimizer than AdamW is like saying you've cured cancer - everyone says it, nobody delivers. Well, Moonshot just delivered at 1 trillion parameter scale.

Why This Changes Everything

This isn't just another model release. This is "Sonnet at home" if you have the hardware. But more importantly:

The vibes are OFF THE CHARTS. Every high-taste model tester I know is saying this is the best open source model they've ever used. It doesn't have that "open source smell" - it feels like a frontier model because it IS a frontier model.

Not only a math genius

Importantly, this model is great at multiple things, as folks called out it's personality or writing style specifically! Our Friend Sam Paech, creator of EQBench, has noted that this is maybe the first time an open source model writes this well, and is in fact SOTA on his Creative Writing benchmark and EQBench!

Quick Shoutouts

Before we dive deeper, huge props to:

🎉 This Week's Buzz: W&B Inference Goes Live with Kimi-K2! (X)

Ok, but what if you want to try Kimi-K2 but don't have the ability to run 1T models willy nilly? Well, Folks, I've been waiting TWO AND A HALF YEARS to say this: We're no longer GPU poor!

Weights & Biases + CoreWeave = Your new inference playground. We launched Kimi K2 on our infrastructure within 3 days of release!

Sitting behind the scenes on this launch was surreal - as I've been covering all the other inference service launches, I knew exactly what we all want, fast inference, full non-quantized weights, OpenAI API compatibility, great playground to test it out, function calling and tool use. And we've gotten almost all of these, while the super cracked CoreWeave and W&B Weave teams worked their ass off over the weekend to get this shipped in just a few days!

And here’s the kicker: I’m giving away $50 in inference credits to 20 of you to try Kimi K2 on our platform. Just reply “K2-Koolaid-ThursdAI” to our X launch post here and we'll pick up to 20 winners with $50 worth of credits! 🫡

It’s live now at api.inference.wandb.ai/v1 (model ID: moonshotai/Kimi-K2-Instruct), fully integrated with Weave for tracing and evaluation. We’re just getting started, and I want your feedback to make this even better. More on W&B Inference Docs - oh and everyone gets $2 free even without me, which is like 500K tokens to test it out.

Big CO LLMs + APIs

The big players didn't sleep this week either—funding flew like confetti, Grok went full anime, and OpenAI dropped agents mid-stream (we reacted live!). Amazon snuck in with dev tools, and Gemini embeddings claimed the throne. Let's get through some of these openers before we get to the "main course" which of course came from OpenAI

Grok Gets... Waifus?

I can't believe I'm writing this in a serious AI newsletter, but here we are. XAI added animated 3D characters to Grok, including "Annie" - and let's just say she's very... interactive. XAI partnered with a company that does real time animated 3d avatars and these are powered by Grok so... they are a bit unhinged!

The same Elon who's worried about birth rates just created nuclear-grade digital companions. The Grok app shot to #1 in the Japanese App Store immediately. Make of that what you will. 😅

They even posted a job for "Full Stack Waifu Engineer" - we truly live in the strangest timeline.

XAI also this week addressed the concerns we all had with "mechahitler" and the Grok4 issues post launch (where it used it's web search to see "what does Elon think" when it was asked about a few topics)

Credit for finding the prompt change: Simon Willison

Other Quick Hits from Big Tech

🔥 OpenAI Agents: ChatGPT Levels Up to Do-It-All Sidekick

We timed it perfectly—OpenAI's live stream hit mid-show, and we reacted with 30,000+ of you! And while we didn't get the rumored Open Source model from OAI, we did get... ChatGPT Agent (codename Odyssey) which merges Deep Research's fast-reading text browser with Operator's clicky visual browser and terminal access, all RL-tuned to pick tools smartly. It browses, codes, calls APIs (Google Drive, GitHub, etc., if you connect), generates images, and builds spreadsheets/slides—handling interruptions, clarifications, and takeovers for collaboration. SOTA jumps: 41.6% on Humanities Last Exam (double O3), 27.4% on FrontierMath, 45.5% on SpreadsheetBench, 68.9% on BrowseComp.

These are insane jumps in capabilities folks, just... mindblowing that we can now have agents that are SO good!

The team demoed wedding planning (outfits, hotels, gifts with weather/venue checks), sticker design/ordering, and an MLB itinerary spreadsheet—wild to watch it chain thoughts on recordings.

Wolfram called it the official start of agent year; Yam hyped the product polish (mobile control!); Nisten noted it's packaged perfection over DIY. I refreshed ChatGPT obsessively—mind-blown at turning my phone into a task master. Available now for Pro/Plus/Team (400/40 queries/month), Enterprise soon. This is the "feel the AGI" moment Sam mentioned—game over for tedious tasks (OpenAI announcement: https://openai.com/index/introducing-chatgpt-agent/).

I've yet to get access to it, but I'm very much looking forward to testing it out and letting you guys know how it works!

Combining the two browser modes (visual that has my cookies and textual that can scan tons of websites super quick) + CLI + deep research abilities + RL for the right kind of tool use all sounds incredibly intriguing!


Vision & Video

Runway’s Act-Two: Motion Capture Gets a Major Upgrade (X, YouTube)

Runway’s latest drop, Act-Two, is a next-gen motion capture model that’s got creatives buzzing. It tracks head, face, body, and hands with insane fidelity, animating any character from a single performance video. It’s a huge leap from Act-One, already in use for film, VFX, and gaming, and available now to enterprise and creative customers with a full rollout soon.

Voice & Audio

Mistral’s Voxtral: Open Speech Recognition Champ (X, HF)

Mistral AI is killing it with Voxtral, a state-of-the-art open speech recognition model. With Voxtral Small at 24B for production and Mini at 3B for edge devices, it outperforms OpenAI’s Whisper large-v3 across English and multilingual tasks like French, Spanish, Hindi, and German. Supporting up to 32K token context (about 30-40 minutes of audio), it offers summarization and Q&A features, all under an Apache 2.0 license. At just $0.001 per minute via API, it’s a steal for real-time or batch transcription.

Tools

Liquid AI’s LEAP and Apollo: On-Device AI for All

Liquid AI is bringing AI to your pocket with LEAP, a developer platform for building on-device models, and Apollo, a lightweight iOS app to run small LLMs locally. We’re talking 50-300MB models optimized for minimal battery drain and instant inference, no cloud needed. It’s privacy-focused and plug-and-play, perfect for offline workflows on Android and iOS. Developers, this is your prototyping dream—join the community via X.

Amazon Kiro: Your Spec-Driven Coding Buddy

I’ve already touched on Amazon’s Kiro, but let me reiterate—this spec-driven AI IDE is a standout. It structures your dev process around requirements, letting you define projects in plain language or diagrams before coding starts. It automates docs, testing, and more, feeling like a technical PM guiding you from concept to production. Early users are hooked on its PRD mode, and it’s free during preview. Give it a spin—details on X.

Wrapping Up: An Unforgettable AI Birthday Bash

What a week, folks! From Kimi K2 redefining open-source power to OpenAI’s ChatGPT Agent ushering in a new era of task automation, this has been a whirlwind of innovation. Throw in Grok’s quirky waifus and our own W&B Inference launch, and I’m left speechless on my birthday. Sharing this with over 30,000 of you during our live stream was the ultimate gift—AI is moving at a pace I couldn’t have dreamed of when I started ThursdAI. Here’s to more breakthroughs, and I can’t wait to see what you build with Kimi K2 credits. Let’s keep pushing the boundaries together!

P.S - If you'd like to support this podcast/newsletter and give me a birthday present, the best way is to tell your friends about it and the second best way is to subscribe 👏

TL;DR and Show Notes

Here’s everything we covered this week on ThursdAI for July 17, 2025, packed with links and key highlights for you to dive deeper:

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Kimi K2 Moonshot AI Open Source LLM Weights & Biases CoreWeave ChatGPT Agent Grok AI XAI Liquid AI LEAP Apollo Amazon Kiro
相关文章