AI 领域动态：芯片合作、平台升级与模型迭代

Editorial note: apologies for the newsletter and podcast not having come out regularly lately, startup life has kept me and my co-host rather busy… I’ll do my best to resume weekly release cadence for both the newsletter and podcast starting this week.

OpenAI signed a deal with Broadcom to co‑design and deploy custom AI accelerators, aiming to roll out racks of OpenAI‑designed chips starting late next year. The systems will integrate compute, memory, and networking on Broadcom’s Ethernet stack, targeting major efficiency gains for OpenAI’s workloads while reducing reliance on Nvidia and AMD. The partnership fits into a plan to build roughly 10 gigawatts of compute capacity, with OpenAI already constructing a data center in Abilene, Texas and planning additional sites in Texas, New Mexico, Ohio, and the Midwest. Industry estimates put a 1‑gigawatt AI data center at around $50 billion—about $35 billion of which is chips at current Nvidia pricing—highlighting how custom silicon could significantly cut compute costs.

The company also has large agreements with Nvidia, Oracle, and AMD. Nvidia said it intends to invest $100 billion, and AMD effectively granted 160 million shares (around 10% of AMD) to support OpenAI’s buildout—while Broadcom is not investing equity. Broadcom’s custom AI chips (XPUs) have strong demand from hyperscalers, and its stock jumped about 9.9% on the news; however, Broadcom clarified OpenAI is not the previously disclosed $10 billion customer.

Everything OpenAI announced at DevDay 2025: AgentKit, Apps SDK, ChatGPT, and more

OpenAI launches apps inside of ChatGPT

OpenAI launches AgentKit to help developers build and ship AI agents

OpenAI ramps up developer push with more powerful models in its API

Sam Altman says ChatGPT has hit 800M weekly active users

OpenAI’s DevDay 2025 reframed ChatGPT as an app platform and agent OS, debuting Apps inside ChatGPT, a preview Apps SDK, and AgentKit. Apps run directly in ChatGPT responses with interactive UIs, video, login, and actions via the Model Context Protocol (MCP). Launch partners include Canva, Zillow, Coursera, Figma, Spotify, Booking.com, and Expedia, with DoorDash, Instacart, Uber, and AllTrails “coming soon.” Live demos showcased end-to-end “talking to apps”: generating a poster in Canva, auto-building a pitch deck, and pulling Zillow listings with natural-language filters and maps, including full‑screen renders inside ChatGPT.

Other announcements in dev day included:
AgentKit rolled out alongside Agent Builder (a visual “Canva for agents”), ChatKit (an embeddable chat UI), Evals for Agents (step-trace grading, datasets, automated prompt optimization, external-model evals), and a Connectors registry with admin controls. A live demo built two production agents and guardrails in under eight minutes.

Codex moved from research preview to general availability, with on‑stage demos wiring a camera to an Xbox 360 controller, building a voice assistant for lights, and auto‑generating overlays, countdowns, and a group photo.

New API models include GPT‑5 Pro for high‑accuracy, deep‑reasoning use cases; Sora 2 video (preview) with synchronized audio, physical consistency, and granular camera control; and gpt‑realtime‑mini, a low‑latency voice model “70% cheaper” than the prior advanced voice model.

Altman also said ChatGPT now reaches 800M weekly active users, 4M developers, and 6B tokens per minute on the API, with developer app submissions for review opening later this year.

Anthropic launches Claude Haiku 4.5, a smaller, cheaper AI model

Anthropic introduced Claude Haiku 4.5, its smallest and most affordable Claude 4.x model, now available to all users including on the free tier. The company says Haiku 4.5 is notably fast and “punches above its weight,” outperforming older larger models and even surpassing Claude Sonnet 4 on computer‑use tasks. On coding, Haiku 4.5 scores comparably to Claude Sonnet 4 and OpenAI’s GPT‑5 on SWE‑bench Verified, a benchmark for real‑world bug fixing. Pricing-wise, Haiku models run at about one‑third the cost of Sonnet, and Sonnet is roughly one‑fifth the cost of Opus—making Haiku 4.5 the lowest‑cost paid option while granting more capacity to free users due to its smaller size.

The release follows Sonnet 4.5 (September) and Opus 4.1 (August), with an updated Opus targeted for late 2024 or early 2025. Anthropic, valued at $183 billion with a revenue run rate nearing $7 billion and over 300,000 business customers, is accelerating launches amid competition with Google and OpenAI, which released GPT‑5 and expanded with infrastructure deals and Sora.

Google releases Veo 3.1, adds it to Flow video editor

Google rolled out Veo 3.1, an upgrade to its Veo 3 video generation model, focused on higher‑fidelity visuals, stronger prompt adherence, and richer editing controls. The update improves image‑to‑video quality and introduces better audio output, adding synchronized audio to features like reference‑image character control, first/last‑frame guided clip generation, and clip extension from trailing frames. Veo 3.1 also supports granular edits, including adding objects that blend into a clip’s visual style, with object removal coming soon in Flow. The new model is available now in Flow, the Gemini app, and via Vertex and Gemini APIs.

OpenAI Reverses Stance on Use of Copyright Works in Sora

AI Sam Altman and the Sora copyright gamble: ‘I hope Nintendo doesn’t sue us’

OpenAI wasn’t expecting Sora’s copyright drama

OpenAI video app Sora hits 1 million downloads faster than ChatGPT

OpenAI’s Sora soars to No. 1 on Apple’s US App Store

Sora copycats flooded Apple’s App Store, and some still remain

OpenAI faced intense scrutiny over Sora’s training data and copyright handling, prompting a shift in its public stance as legal and public‑relations pressures mounted. The launch sparked rapid user adoption, propelled the app to the top of the U.S. App Store, and also triggered a wave of copycat apps. Commentary from Altman and reporting from multiple outlets underscore how copyright risk, user demand, and platform governance are intersecting for Sora’s rapid rollout.

Other News

Tools

Microsoft launches ‘vibe working’ in Excel and Word. Microsoft’s new “Agent Mode,” powered by OpenAI’s GPT‑5 (with an Anthropic‑powered Office Agent in Copilot chat), can generate, plan, and execute complex spreadsheets, documents, and slide decks from simple prompts.

Anthropic AI Releases Petri: An Open-Source Framework for Automated Auditing by Using AI Agents to Test the Behaviors of Target Models on Diverse Scenarios. Petri automates large‑scale alignment audits by orchestrating an auditor agent to run multi‑turn, tool‑augmented probes against target models, synthesize realistic environments and tools, and use an LLM judge to score transcripts across a default 36‑dimension rubric.

Anthropic turns to ‘skills’ to make Claude more useful at work. Organizations can create and share reusable “Skills”—sets of instructions, scripts, and resources—that teach Claude to perform specific workplace tasks, integrating across Claude.ai, Claude Code, the API, and the Claude Agent SDK.

Salesforce announces Agentforce 360 as enterprise AI competition heats up. The update adds new prompting and builder tools (including a beta Agent Script and Agentforce Builder with “Vibes” app‑vibe coding), deepens Agentforce’s Slack integration, and lets customers use reasoning models from Anthropic, OpenAI, and Google to build more predictable, flexible enterprise agents.

Slack is turning Slackbot into an AI assistant. Slackbot is gaining the ability to compile plans and summaries from across channels and files, search the workspace with natural language, and coordinate calendars—running inside a VPC so employers can opt out.

Google’s AI Mode image search is getting more conversational. Users will be able to refine searches with natural‑language follow‑ups and mix uploaded reference images with text prompts. The English rollout begins in the U.S. this week.

Google’s Search Live comes to India, AI Mode gets more languages. Google is launching Search Live in English and Hindi in India, expanding AI Mode to seven additional Indian languages, and leveraging local interactions to improve multimodal visual understanding over time.

Microsoft AI announces first image generator created in-house. Microsoft’s in‑house model prioritizes photorealism and speed, incorporates feedback from creative professionals to avoid generic styles, and has already ranked in the top 10 on AI benchmark site LMArena.

Zendesk says its new AI agent can solve 80% of support issues. Zendesk is introducing multiple LLM‑driven agents—an autonomous agent for most tickets, a co‑pilot for human technicians, and specialized admin, voice, and analytics agents—built from recent AI acquisitions and tested with customers who reported higher satisfaction.

Business

Amazon’s Zoox Robotaxis Have Arrived In Las Vegas. Zoox is offering free rides within a mapped, geofenced area along the Las Vegas Strip via a phone app; early rider reports have been mostly positive with no accidents reported.

Waymo’s robotaxis are coming to London. Waymo plans supervised data‑collection runs in London within weeks and aims to launch a fully driverless ride‑hail service via its app in 2026, with vehicles maintained by Moove.

OpenAI is the world’s most valuable private company after private stock sale. A secondary share sale paid $6.6 billion to current and former employees, with buyers including SoftBank and T. Rowe Price, valuing OpenAI at $500 billion and underscoring its fundraising momentum amid heavy infrastructure spending and ongoing product launches.

Meta partners up with Arm to scale AI efforts. Under a multi‑year deal, Meta will move ranking and recommendation systems onto Arm’s Neoverse platform to improve performance per watt as it expands data‑center capacity (including projects codenamed Prometheus and Hyperion).

Reflection AI raises $2B to be America’s open frontier AI lab, challenging DeepSeek. The new funding will secure large‑scale compute and recruit talent to train a frontier LLM (initially text‑focused, with future multimodal capabilities) whose publicly released model weights aim to offer an open‑access alternative, while monetization targets enterprise and sovereign deployments.

Supabase nabs $5B valuation, four months after hitting $2B. Supabase raised fresh funding, bringing total capital to $500 million, and included an option for community developers to buy stock as part of its Series E.

Character.AI removes Disney characters from platform after studio issues warning. Character.AI removed user‑created bots imitating Disney characters after receiving a cease‑and‑desist letter alleging unauthorized use of copyrighted and trademarked characters.

General Intuition lands $134M seed to teach agents spatial reasoning using video game clips. The startup is using Medal’s dataset of gaming clips to train agents and foundation models that learn spatial‑temporal reasoning from first‑person gameplay—targeting smarter in‑game bots and search‑and‑rescue drones—and raised $133.7M to scale research and engineering.

Research

Reasoning with Sampling: Your Base Model is Smarter Than You Think. Using a training‑free MCMC sampling method targeting a “power distribution” over base‑model outputs, the authors show inference‑time sampling can match or exceed RL post‑training (GRPO) on single‑shot and out‑of‑domain reasoning tasks while preserving multi‑sample diversity.

Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning. This paper presents a memory‑efficient, highly parallel evolution strategies implementation that directly searches billions of model parameters for LLM fine‑tuning, showing better sample efficiency, robustness, and stability than reinforcement learning on outcome‑only reasoning tasks.

Base Models Know How to Reason, Thinking Models Learn When. The authors argue that much of “thinking model” advantage comes from learning when to activate reasoning behaviors that base models already possess, enabling steered base models to recover most of the benchmark performance gap via a small fraction of targeted activation edits.

The Art of Scaling Reinforcement Learning Compute for LLMs. A predictive sigmoid‑like scaling framework and an RL recipe called ScaleRL—validated across hundreds of thousands of GPU‑hours—let researchers extrapolate RL performance from small runs, identify scalable methods, and improve asymptotic performance and compute efficiency.

Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks. MemAct treats memory curation as explicit editing actions and pairs it with a Dynamic Context Policy Optimization algorithm so agents can autonomously manage and optimize working memory for long‑horizon tasks while controlling token and latency costs.

Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity. Verbalized Sampling is a prompting technique that asks models to output multiple responses with probabilities, countering typicality bias in preference data and recovering pretrained diversity without retraining.

Can GenAI Improve Academic Performance? Evidence from the Social and Behavioral Sciences. Using author‑level panel data and a difference‑in‑differences design, the study finds that researchers who began using GenAI after ChatGPT’s release increased publication output—especially early‑career and non‑English‑speaking authors—and saw a modest rise in average journal impact.

Concerns

OpenAI’s internal Slack messages could cost it billions in copyright suit. Internal Slack and email discussions about deleting a pirated LibGen training dataset—and whether lawyers advised that deletion—are now key evidence as plaintiffs seek to show intentional destruction of evidence and secure access to privileged communications, potentially increasing damages dramatically.

AI users sue Microsoft in antitrust class action over OpenAI deal | Reuters. A proposed class action alleges Microsoft’s investment and arrangements with OpenAI violate antitrust laws, seeking remedies over competitive harm.

Policy

California becomes first state to regulate AI companion chatbots. The new law requires age verification, content warnings, suicide‑prevention protocols, and clear disclosures that interactions are AI‑generated, and bans chatbots from portraying themselves as healthcare professionals. Violations can carry penalties, including fines for illegal deepfakes.

Analysis

How ByteDance Made China’s Most Popular AI Chatbot. ByteDance’s Doubao combines chat, image and short‑video generation, multimodal voice and video interaction, customizable AI agents, and deep integration with Douyin to reach a broad, nontechnical user base—amassing over 157 million monthly active users.

Over 50 Percent of the Internet Is Now AI Slop, New Data Finds. An analysis by Graphite of 65,000 English‑language articles using the Surfer detector finds AI‑written content rose sharply after ChatGPT’s 2022 debut and now sits at about 52% of new articles, though detector accuracy and sample biases could mean the true share of human content is higher.

OpenAI Inks Deal With Broadcom to Design Its Own Chips for A.I.