ThursdAI - Recaps of the most high signal AI weekly spaces 09月25日 18:01
AI行业动态:开源模型浪潮与巨头竞争
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本周AI领域迎来重大动态:百度开源ERNIE 4.5系列模型,包含10个不同参数规模的模型;腾讯发布Hunyuan-A13B-Instruct模型,由WizardLM团队打造;华为Pangu Pro MoE模型在Ascend NPU上训练,突破制裁限制;DeepSWE-Preview RL编码代理在SWE-Bench上取得59%成绩。同时,Meta大举招募AI人才组建超级智能实验室,Cursor聘用Claude Code核心成员,Microsoft医疗AI在复杂病例诊断中超越医生。云flare推出一键AI机器人拦截功能,Mirage发布AI原生游戏引擎,Daytona提供代理沙盒 runtime服务,Kyutai和Qwen推出新一代TTS技术。

🔍 百度开源ERNIE 4.5系列包含10个模型,参数范围从424亿到0.3亿,支持多模态输入输出,其47B参数的MoE模型在视觉知识任务DocVQA上以93%的准确率超越OpenAI的o1模型(81%)。该系列模型采用Apache 2.0许可证,拥有128K上下文窗口,是百度多年来在生产环境(如聊天机器人)中应用ERNIE技术的首次大规模开源。

🧙‍♂️ 腾讯发布的Hunyuan-A13B-Instruct模型由从微软挖角的WizardLM团队开发,拥有80B总参数(13B主动参数),256K上下文窗口,混合推理模式,在AIME 2024任务上取得87%成绩。尽管存在轻微过拟合(2025年任务成绩76%),但作为12B主动参数模型仍表现出色。该模型许可证限制欧盟、英国、韩国的商业使用,且禁止用户超过1000万活跃用户。

💻 华为Pangu Pro MoE模型突破美国制裁,完全基于自研Ascend NPU训练,使用4000片芯片处理13万亿token数据。其16B主动参数/token设计实现每张卡1,528个token/秒的推理速度(推测解码),在速度和成本效率上超越密集模型,性能接近DeepSeek和Qwen,为非英伟达生态提供了重要选择。

🤖 DeepSWE-Preview RL编码代理通过强化学习在Qwen3-32B上训练,无需借鉴Claude等专有模型,仅用64块H100 GPU经过6天训练就在SWE-Bench-Verified上取得59%成绩(Pass@1 42.2%,Pass@16 71%)。该项目开源代码、数据和日志,展示了学术研究者如何利用有限资源打破性能壁垒。

🏆 Meta超级智能实验室(MSL)由Alex Wang和Nat Friedman领导,以高达3000万美元的薪酬从OpenAI、DeepMind等公司招募10名关键研究员,包括GPT-4图像生成和o1模型核心成员,旨在挑战OpenAI在大型语言模型领域的领导地位。该实验室整合FAIR和GenAI团队,拥有强大GPU资源,预示着AI人才争夺战的白热化。

🔧 Cursor聘用Claude Code的Boris Cherny和Cat Wu担任架构师和产品主管,并在网页和移动端推出AI编码代理,支持Slack集成。该公司完成2000万美元融资,通过整合Claude Code技术强化代码生成能力,进一步模糊了原生工具与网页工具的界限。

👨‍⚕️ Microsoft的MAI-DxO AI系统在处理《新英格兰医学杂志》的复杂病例时,准确率达到85.5%,远超经验丰富医生20%的平均水平。该系统通过模拟虚拟医生团队进行问诊、检查和成本控制,展示了AI作为医生辅助工具的巨大潜力,而非简单替代。

🛡️ Cloudflare推出面向所有用户的AI机器人一键拦截功能,针对Bytespider、GPTBot等爬虫,旨在应对AI内容摘要替代传统点击流量的趋势对创作者经济模式的冲击。该功能利用机器学习检测伪装浏览器行为的机器人,引发关于数据训练授权和收益分配的讨论。

🎮 Dynamics Lab的Mirage AI原生游戏引擎支持实时生成照片级真实世界环境,通过自然语言或控制器输入进行交互,无需预置游戏。其16帧/秒的运行速度和无限自定义内容能力,预示着个性化游戏体验的来临,推动“每个像素都将生成”的愿景。

🗣️ Kyutai TTS实现220毫秒首字符延迟,英语和法语语音相似度分别达77.1%和78.7%,错误率仅2.82%。该技术由Moshi团队开发,支持10秒语音克隆,适用于LLM集成应用;Qwen-TTS则专注于中文方言并支持英语,通过API提供人类级自然度。

🛠️ Daytona云平台提供专为代理设计的“状态服务器less”沙盒,支持代码执行、数据分析等任务,2个月内实现100万美元年收入和15000用户。其“状态ful serverless”技术解决代理任务中的延迟和隐私问题,成为AI代理发展的重要基础设施支撑

Hey everyone, Alex here 👋

Welcome back to another mind-blowing week on ThursdAI! We’re diving into the first show of the second half of 2025, and let me tell you, AI is not slowing down. This week, we’ve got a massive wave of open-source models from Chinese giants like Baidu and Tencent that are shaking up the game, Meta’s jaw-dropping hiring spree with Zuck assembling an AI dream team, and Microsoft’s medical AI outperforming doctors on the toughest cases. Plus, a real-time AI game engine that had me geeking out on stream. Buckle up, folks, because we’ve got a lot to unpack!

ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

We had incredible guests like Michael Luo from Agentica, dropping knowledge on RL coding agents, and Ivan Burazin from Daytona, revealing the infrastructure powering the agent era. We had an incredible episode this week, with over 8,000 views for the live show (as always, Links and Show notes in the end, and the YT live video is here for your convienience if you'd prefer watching)

Open Source AI & LLMs: The Chinese Powerhouse Wave

Man, if there’s one takeaway from this week, it’s that Chinese companies are absolutely dominating the open-source LLM scene. Let’s break down the heavy hitters that dropped this week and why they’ve got everyone talking.

Baidu’s ERNIE 4.5: A Suite of 10 Models to Rule Them All

Baidu, a giant in the Chinese tech space, just flipped the script by open-sourcing their ERNIE 4.5 series. We’re talking 10 distinct models ranging from a whopping 424 billion parameters down to a tiny 0.3 billion. With an Apache 2.0 license, 128K context window, and multimodal capabilities handling image, video, and text input, this is a massive drop. Their biggest Mixture-of-Experts (MoE) model, with 47B active parameters, even outshines OpenAI’s o1 on visual knowledge tasks like DocVQA, scoring 93% compared to o1’s 81%!

What’s wild to me is Baidu’s shift. They’ve been running ERNIE in production for years—think chatbots and more across their ecosystem—but they weren’t always open-source fans. Now, they’re not just joining the party, they’re hosting it. If you’re into tinkering, this is your playground—check it out on Hugging Face (HF) or dive into their technical paper (Paper).

Tencent’s Hunyuan-A13B-Instruct: WizardLM Team Strikes Again

Next up, Tencent dropped Hunyuan-A13B-Instruct, and oh boy, does it have a backstory. This 80B parameter MoE model (13B active at inference) comes from the legendary WizardLM team, poached from Microsoft after a messy saga where their killer models got yanked from the internet over “safety concerns.” I remember the frustration—we were all hyped, then bam, gone. Now, under Tencent’s wing, they’ve cooked up a model with a 256K context window, hybrid fast-and-slow reasoning modes, and benchmarks that rival DeepSeek R1 and OpenAI o1 on agentic tasks. It scores an impressive 87% on AIME 2024, though it dips to 76% on 2025, hinting at some overfitting quirks. Though for a 12B active parameters model this all is still VERY impressive.

Here’s the catch—the license. It excludes commercial use in the EU, UK, and South Korea, and bans usage if you’ve got over 100M active users. So, not as open as we’d like, but for its size, it’s a beast that fits on a single machine, making it a practical choice for many. They’ve also released two datasets, ArtifactsBench and C3-Bench, for code and agent evaluation. I’m not sold on the name—Hunyuan doesn’t roll off the tongue for Western markets—but the WizardLM pedigree means it’s worth a look. Try it out on Hugging Face (HF) or test it directly (Try It).

Huawei’s Pangu Pro MoE: Sidestepping Sanctions with Ascend NPUs

Huawei entered the fray with Pangu Pro MoE, a 72B parameter model with 16B active per token, and here’s what got me hyped—it’s trained entirely on their own Ascend NPUs, not Nvidia or AMD hardware. This is a bold move to bypass US sanctions, using 4,000 of these chips to preprocess 13 trillion tokens. The result? Up to 1,528 tokens per second per card with speculative decoding, outpacing dense models in speed and cost-efficiency. Performance-wise, it’s close to DeepSeek and Qwen, making it a contender for those outside the Nvidia ecosystem.

I’m intrigued by the geopolitical angle here. Huawei’s proving you don’t need Western tech to build frontier models, and while we don’t know who’s got access to these Ascend NPUs, it’s likely a game-changer for Chinese firms. Licensing isn’t as permissive as MIT or Apache, but it’s still open-weight. Peek at it on Hugging Face (HF) for more details.

DeepSWE-Preview: RL Coding Agent Hits 59% on SWE-Bench

Switching gears, I was blown away chatting with Michael Luo from Agentica about DeepSWE-Preview, an open-source coding agent trained with reinforcement learning (RL) on Qwen3-32B. This thing scored a stellar 59% on SWE-Bench-Verified (42.2% Pass@1, 71% Pass@16), one of the top open-weight results out there. What’s cool is they did this without distilling from proprietary giants like Claude—just pure RL over six days on 64 H100 GPUs. Michael shared how RL is surging because pre-training hits data limits, and DeepSWE learned emergent behaviors like paranoia, double-checking edge cases to avoid shaky fixes.

This underdog story of academic researchers breaking benchmarks with limited resources is inspiring. They’ve open-sourced everything—code, data, logs—making it a goldmine for the community. I’m rooting for them to get more compute to push past even higher scores. Dive into the details on their blog (Notion) or check the model on Hugging Face (HF Model).


This Week’s Buzz from Weights & Biases: come Hack with Us! 🔥

As always, I’ve got some exciting news from Weights & Biases to share. We’re hosting the first of our Weavehacks hackathons in San Francisco on July 12-13. It’s all about agent protocols like MCP and A2A, and I’m stoked to you guys in person—come say hi for a high-five! We’ve got cool prizes, including a custom W&B RoboDog that’s been a conference hit, plus $13-14K in cash. Spots are filling fast, so register now and we'll let you in (Sign Up).

We’re also rolling out Online Evaluations in Weave, letting you monitor LLM apps live with judge agents on production data—super handy for catching hiccups. And our inference service via CoreWeave GPUs offers free credits for open-source model testing. Want in or curious about Weave’s tracing tools? Reach out to me anywhere, and I’ll hook you up. Can’t wait to demo this next week!


Big Companies & APIs: AI’s NBA Draft and Medical Marvels

Shifting to the big players, this week felt like an AI sports season with blockbuster hires and game-changing releases. From Meta’s talent poaching to Microsoft’s medical breakthroughs, let’s unpack the drama and innovation.

Meta Superintelligence Labs: Zuck’s Dream Team Draft

Imagine an AI NBA draft—that’s what Meta’s up to with their new Superintelligence Labs (MSL). Led by Alex Wang (formerly of Scale AI) and Nat Friedman (ex-GitHub CEO), MSL is Zuck’s power move after Llama 4’s lukewarm reception. They’ve poached up to 10 key researchers from OpenAI, including folks behind GPT-4’s image generation and o1’s foundations, with comp packages rumored at $100M for the first year and up to $300M over four years. That’s more than many Meta execs or even Tim Cook’s salary! They’ve also snagged talent from Google DeepMind and even tried to acquire Ilya Sutskever’s SSI outright (to which he said he's flattered but no)

This is brute force at its finest, and I’m joking that I didn’t get a $100M offer myself—ThursdAI’s still waiting for that email, Zuck! OpenAI’s Sam Altman fired back with “missionaries beat mercenaries,” hinting at a culture clash, while Mark Chen felt like Meta “broke into their house and took something” It’s war, folks, and I’m hyped to see if MSL delivers a Llama that crushes it. With FAIR and GenAI folding under this new crack team of 50, plus Meta’s GPU arsenal, the stakes are sky-high.

If you're like to see the list of "mercenaries" worth over 100M, you can see who they are and their achievements here

Cursor’s Killer Hires and Web Expansion

Speaking of talent wars, Cursor (built by AnySphere) just pulled off a stunner by hiring Boris Cherny and Cat Wu, key creators of Claude Code, as Chief Architect and Head of Product. This skyrockets Cursor’s cred in code generation, and I’m not surprised—Claude Code was a side project that exploded, and now Cursor’s got the brains behind it. On top of that, they’ve rolled out AI coding agents to web and mobile, even integrating with Slack. No more being tied to your desktop—launch, monitor, and collab on code tasks anywhere.

The lines between native and web tools are blurring fast, and Cursor’s leading the charge. I haven’t tested the Slack bit yet, but if you have, hit me up in the comments. This, plus their recent $20M raise, shows they’re playing to win. Learn more at (Cursor).

Microsoft MAI-DxO: AI Diagnoses Better Than Doctors

Now, onto something that hits close to home for me—Microsoft’s MAI-DxO, an AI system that’s outdiagnosing doctors on open-ended medical cases. On 304 of the toughest New England Journal of Medicine cases, it scored 85.5% accuracy, over four times the 20% rate of experienced physicians. I’ve had my share of frustrating medical waits, and seeing AI step in as a tool for doctors—not a replacement—gets me excited for the future.

It’s an orchestration of models simulating a virtual clinician panel, asking follow-up questions, ordering tests, and even factoring in cost controls for diagnostics. This isn’t just acing multiple-choice; it handles real-world ambiguity. My co-host Yam and I stressed—don’t skip your doctor for ChatGPT, but expect your doc to be AI-superpowered soon. Read more on Microsoft’s blog (Blog).

ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Cloudflare’s One-Click AI Bot Block: Protecting the Internet

Cloudflare made waves with a one-click feature to block AI bots and scrapers, available to all customers, even free-tier ones. With bots like Bytespider and GPTBot hitting nearly 40% of top sites, but only 3% blocking them, this addresses a huge shift. I’m with the CEO here—the old internet deal was Google scraping for traffic; now, AI summaries keep users from clicking through, breaking monetization for creators. Yam suggested a global license for training data with royalties, and I’m curious if that’s the future. For now, Cloudflare’s ML detects even sneaky bots spoofing as browsers. Big move—check their announcement (X) and the cool website goodaibots.com

Cypher Alpha: Mystery 1M Context Model on OpenRouter

Lastly, a mysterious 1M context model, Cypher Alpha, popped up on OpenRouter for free testing (with data logging). It’s fast at 70 tokens/sec, low latency, but not a reasoning model—refusals on basic queries stumped me. Speculation points to Amazon Titan, which would be a surprise entry. I’m intrigued by who’s behind this—Gemini, OpenAI, and Qwen hit 1M context, but Amazon? Let’s see. Try it yourself (Link).

Vision & Video: Mirage’s AI-Native Game Engine Blows Minds 🤯

Okay, folks, I’ve gotta geek out here. Dynamics Lab unveiled the world’s first AI-native user-generated content (UGC) game engine, live with playable demos like a GTA-style “Urban Chaos” and a racing “Coastal Drift.” Running at 16 frames per second, it generates photorealistic worlds in real-time via natural language or controller input. You can jump, run, fight, or drive, and even upload an image to spawn a new game environment on the fly.

What’s nuts is there’s no pre-built game behind this—it’s infinite, custom content created as you play. I was floored showing this on stream; it’s obviously not perfect with clipping and delays, but we’re witnessing the dawn of personalized gaming. You gotta try this—head to their site for the demos (Playable Demo).

This brings us even more closer to the "every pixel will be generated" dream of Jensen Huang.

Voice & Audio: TTS Gets Real with Kyutai and Qwen

This week brought fresh text-to-speech (TTS) updates that hint at smarter conversational AI down the line. Kyutai TTS, from the French team behind Moshi, dropped with ultra-low latency (220ms first-token) and high speaker similarity (77.1% English, 78.7% French), plus a word error rate of just 2.82% in English. It’s production-ready with a Rust server and voice cloning from a 10-second clip—perfect for LLM-integrated apps. Check it out (X Announcement, HF Model).

Qwen-TTS from Alibaba also launched, focusing on Chinese dialects like Pekingese and Shanghainese, but with English support too. It’s got human-level naturalness via API, though less relevant for our English audience. Still, it’s a solid step—see more (X Post). Both are pieces of the puzzle for richer virtual interactions, and I’m pumped to see where this goes.

Infrastructure for Agents: Daytona’s Sandbox Revolution

I’m thrilled to have chatted with Ivan Burazin from Daytona, a cloud provider delivering agent-native runtimes—or sandboxes—that give agents their own computers for tasks like code execution or data analysis. They’ve hit over $1M in annualized run rate just two months post-launch, with 15,000 signups and 1,500 credit cards on file. That’s insane growth for infrastructure, which usually ramps slowly due to integration delays.

Why’s this hot? 2025 is the year of agents, and as Ivan shared, even OpenAI and Anthropic recently redefined agents as needing runtimes. From YC’s latest batch (37% building agents) to Cursor’s web move, every task may soon spin up a sandbox. Daytona’s “stateful serverless” tech spins fast, lasts long, and scales across regions like the US, UK, Germany, and India, addressing latency and GDPR needs. If you’re building agents, this is your unsung hero—explore it at (Daytona IO) and grab $200 in credits, or up to $50K for startups (Startups).

Wrapping Up: AI’s Relentless Pace

What a week, folks! From Chinese open-source titans like ERNIE 4.5 and Hunyuan-A13B redefining accessibility, to Meta’s blockbuster hires signaling an AI arms race, and Microsoft’s MAI-DxO paving the way for smarter healthcare, we’re witnessing AI’s relentless acceleration. Mirage’s game engine and Daytona’s sandboxes remind us that creativity and infrastructure are just as critical as models themselves. I’m buzzing with anticipation for what’s next—will Meta’s dream team deliver? Will agents redefine every app? Stick with ThursdAI to find out. See you next week for more!

TL;DR and Show Notes

Here’s the quick rundown of everything we covered this week, packed with links to dive deeper:

Thanks for reading all the way through ThursdAI, folks! Share this with friends to spread the AI love, and I’ll catch you next week for more!

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人工智能 开源模型 百度 腾讯 华为 Meta Microsoft 云flare Cursor Daytona AI游戏引擎 深度学习 自然语言处理
相关文章