Last Week in AI 08月28日
谷歌Gemini升级AI图像功能,Anthropic与作者达成和解
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

谷歌推出Gemini 2.5 Flash Image,为Gemini应用用户和开发者带来原生的图像生成与编辑能力。该工具能实现精细化指令编辑,保持身份和场景一致性,并支持融合多重参考。同时,Anthropic与作者达成初步和解,避免了高额潜在赔偿。此外,Anthropic还发布了Chrome扩展程序“Claude for Chrome”,为用户提供AI辅助浏览体验,并强调了安全防护措施。其他AI动态包括苹果集成GPT-5、波士顿动力Atlas的AI训练新进展、Nvidia发布新模型及多家公司对AI模型的更新和开源。

🚀 **谷歌Gemini图像功能全面升级**:谷歌已向所有Gemini应用用户及开发者通过API、AI Studio和Vertex AI推出Gemini 2.5 Flash Image,提供原生的图像生成与编辑能力。该功能强调精细化指令遵循,能够改变物体颜色而不扭曲面部或背景,并能融合多个参考对象(如狗和人),同时保持其原有特征。升级后的模型在LMArena等基准测试中表现卓越,支持多轮编辑、更强的世界知识以及多参考组合,旨在提升消费者任务(如家居园林可视化)的视觉质量和编辑流畅性。

⚖️ **Anthropic与作者达成AI版权诉讼和解**:Anthropic已与作者Andrea Bartz、Charles Graeber和Kirk Wallace Johnson就一项集体诉讼达成初步和解,从而避免了可能导致灾难性法定损害的12月审判。尽管法官此前在“合理使用”模型训练方面大致支持Anthropic,但认为该公司可能通过LibGen等渠道“盗用”了作品,使得集体诉讼得以继续。此次和解的条款将备受关注,因为它可能影响到其他正在进行的AI版权案件,包括唱片公司就训练数据和下载内容提起的诉讼。

🌐 **Anthropic推出Claude for Chrome浏览器插件**:Anthropic发布了“Claude for Chrome”的研究预览版,这是一个作为Chrome扩展程序的AI助手,能够访问完整页面内容。该功能面向1000名Anthropic Max订阅用户开放,并已开放等待列表。用户可授权该代理导航、填写表单和完成任务,且该扩展能在不同标签页间保持状态。此举与Perplexity的Comet、OpenAI的浏览器项目以及谷歌的Gemini集成等AI驱动浏览器的趋势一致,同时Anthropic强调了其在应对浏览器AI风险方面的安全措施,包括减少提示注入成功率和提供用户控制选项。

💡 **AI领域动态速览**:苹果计划将OpenAI的GPT-5集成到iOS和macOS系统中,用于Siri和系统级AI功能;波士顿动力与TRI利用大型行为模型训练Atlas人形机器人;Nvidia发布了新的小型开源模型Nemotron-Nano-9B-v2;中国DeepSeek发布V3.1模型,提升了长对话的上下文窗口和回忆能力;埃隆·马斯克表示xAI已开源Grok 2.5;谷歌发布了Imagen 4 Fast和Imagen 4系列,并推出了轻量级Gemma模型;谷歌还在测试Gemini的Agent Mode、Gemini Go和Immersive View等新模式;微软发布了VibeVoice-1.5B文本转语音模型;OpenAI在印度推出价格更低的ChatGPT订阅计划;Meta的超级智能实验室面临研究人员流失;亚马逊AGI Labs负责人为其“反向收购”辩护;硅谷AI交易催生“僵尸”初创公司;Anthropic将Claude Code集成到企业版套餐中。

🧠 **AI研究前沿与安全考量**:研究人员提出了“Deep Think with Confidence”方法,通过过滤低质量推理路径来降低AI的token使用量,同时保持或提高准确性。研究还证明了多头Transformer模型在特定形式下可通过梯度下降学习符号多步推理。在安全方面,有家长起诉OpenAI,称ChatGPT在诱导儿子自杀中扮演了角色;专家警告AI的“谄媚”回复可能是一种“暗模式”,旨在通过情感依赖来盈利;AI生成论文的剽窃风险引发关注;谷歌首次公布了AI提示的能耗数据;特斯拉因误导性自动驾驶宣传面临集体诉讼;有银行因谎报聊天机器人生产力被要求重新雇佣员工。此外,Anthropic收紧了AI安全规则,禁止协助制造高爆炸药和网络攻击等活动。

Top News

Google Gemini’s AI image model gets a ‘bananas’ upgrade

Google is rolling out Gemini 2.5 Flash Image, a native image generation and editing capability inside its Gemini 2.5 Flash model, to all Gemini app users and to developers via the Gemini API, Google AI Studio, and Vertex AI. The tool emphasizes fine-grained, instruction-following edits that preserve identity and scene consistency—such as changing a shirt color without distorting faces or backgrounds—and can blend multiple references (e.g., a dog and a person) while keeping likenesses.

Google says the model is state-of-the-art on LMArena and other benchmarks; it previously appeared there under the pseudonym “nano-banana,” sparking social media buzz. Multi-turn editing, stronger “world knowledge,” and multi-reference composition (like merging a sofa photo, a living room image, and a color palette) are supported.

Positioned against OpenAI’s GPT-4o image tools and Meta’s licensed Midjourney offerings—amid strong benchmark leaders like Black Forest Labs’ FLUX—the update aims to improve visual quality, instruction adherence, and edit seamlessness for consumer tasks such as home and garden visualization, per product lead Nicole Brichtova. Safeguards include TOS restrictions on non-consensual intimate imagery and visible watermarks plus metadata identifiers, following prior issues with historically inaccurate people images.

Anthropic Settles High-Profile AI Copyright Lawsuit Brought by Book Authors

Anthropic reached a preliminary settlement in a class action brought by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, averting a December trial and potentially catastrophic statutory damages. Judge William Alsup had largely sided with Anthropic on “fair use” for model training at summary judgment, but found the company likely “pirated” works by acquiring them from shadow libraries like LibGen, allowing the class claim to proceed. With statutory damages starting at $750 per infringed work and an alleged corpus of about 7 million books, Anthropic faced theoretical exposure in the billions to over $1 trillion. The settlement is expected to be finalized September 3; plaintiffs’ counsel called it “historic,” while Anthropic declined comment.

The class notification process had only just begun, with the Authors Guild alerting writers and a “list of affected works” due September 1—meaning many potential class members were not part of negotiations. Legal scholars noted Anthropic had “few defenses at trial” on the acquisition issue after Alsup’s ruling, prompting a rapid strategic shift even as the company brought in a new trial team. While the outcome sets no legal precedent, the terms will be closely scrutinized amid parallel suits, including a major case by record labels over training on copyrighted lyrics and BitTorrent-based song downloads.

Anthropic launches a Claude AI agent that lives in Chrome

Anthropic debuted a research preview of “Claude for Chrome,” a sidecar AI agent that runs as a Chrome extension with full page context. It’s rolling out to 1,000 Anthropic Max subscribers ($100–$200/month), with a waitlist open. Users can optionally grant the agent permission to navigate, fill forms, and complete tasks, and the extension maintains state across tabs. The move joins a broader push into AI-powered browsing, alongside Perplexity’s Comet, reported OpenAI browser efforts, and Google’s Gemini integrations.

Safety is a headline focus: Anthropic highlights new risks from browser-accessible agents, especially indirect prompt injection via hidden webpage instructions—recently flagged and patched in Comet. The company reports defenses that cut prompt injection success from 23.6% to 11.2%, default blocks for categories like financial services, adult content, and piracy, and user controls to restrict site access. High‑risk actions (publishing, purchasing, sharing personal data) require explicit permission.

Other News

Tools

Apple brings OpenAI’s GPT-5 to iOS and macOS. Apple plans to adopt GPT-5 for Siri and systemwide AI features in iOS 26, iPadOS 26, and macOS Tahoe 26, though timing and whether users can select a reasoning-optimized mode remain unclear.

Boston Dynamics and TRI use large behavior models to train Atlas humanoid. By training language-conditioned neural policies on teleoperated real-robot and simulated demos, Atlas can perform long-horizon, whole-body manipulation and recovery behaviors across varied tasks and embodiments.

Nvidia releases a new small, open model Nemotron-Nano-9B-v2 with toggle on/off reasoning. This 9B hybrid Mamba‑Transformer fits on a single A10 GPU, supports multiple languages and code tasks, and lets developers budget internal reasoning tokens to trade off accuracy and latency under a permissive commercial license.

China’s DeepSeek Releases V3.1, Boosting AI Model’s Capabilities. The V3.1 update extends the context window for longer conversations with improved recall, though detailed documentation is still pending.

Elon Musk says xAI has open sourced Grok 2.5. Grok 2.5 weights are now on Hugging Face under a custom license with some anti-competitive terms, amid controversy over problematic outputs and plans to open-source Grok 3 in roughly six months.

Announcing Imagen 4 Fast and the general availability of the Imagen 4 family in the Gemini API. The lineup adds a faster, lower-cost model for high-volume generation, while Imagen 4 and 4 Ultra improve text rendering and support up to 2K output for higher-detail images.

Google releases pint-size Gemma open AI model. This 270M-parameter variant is designed to run locally on devices—including phones and browsers—offering efficient battery use and solid instruction-following despite its small size.

Google tests new Gemini modes. Three experimental modes—Agent Mode for autonomous multi-step tasks, Gemini Go for collaborative ideation, and Immersive View for visual answers—are being trialed as modular additions to the core assistant.

Google develops Projects feature for Gemini. A leaked UI suggests project workspaces for file management, project-specific instructions, and a research button so Gemini can reference documents in chats and generate new content.

Google is building a Duolingo rival into the Translate app. A beta uses Gemini to generate personalized lessons and practice exercises (currently English↔Spanish/French and English practice for Spanish, French, and Portuguese speakers) and adds a live translation mode for real-time conversations in 70+ languages.

Qwen Team Introduces Qwen-Image-Edit: The Image Editing Version of Qwen-Image with Advanced Capabilities for Semantic and Appearance Editing. Enhancements include dual-image encoding, bilingual-accurate text editing, and frame-aware positional encoding, achieving top benchmark scores with deployment via Hugging Face and Alibaba Cloud.

Microsoft Released VibeVoice-1.5B: An Open-Source Text-to-Speech Model that can Synthesize up to 90 Minutes of Speech with Four Distinct Speakers. The MIT-licensed model generates up to 90 minutes of uninterrupted, expressive multi-speaker TTS (up to four speakers), supports cross-lingual and singing synthesis, and targets streaming long-form audio with a 1.5B-parameter LLM backbone and lightweight diffusion decoder.

Business

OpenAI launches a sub-$5 ChatGPT plan in India. Priced at ₹399/month, the plan offers 10x higher message, image-generation, and file-upload limits plus twice the memory of the free tier, supports UPI payments, and is initially limited to India as OpenAI gauges expansion.

The power shift inside OpenAI. Fidji Simo will run OpenAI’s consumer-facing division and day-to-day operations, turning ChatGPT into a monetized suite while Sam Altman focuses on large-scale compute, research, and experimental projects.

Researchers Are Already Leaving Meta’s New Superintelligence Lab. Several high-profile researchers and a product director departed Meta’s Superintelligence Lab within months, with at least two returning to OpenAI; sources cite recruitment, organizational, and location challenges.

Read the Full Memo Alexandr Wang Sent About Meta's Massive AI Reorg. A major consolidation creates four teams—research (FAIR and TBD Lab), training, products (led by Nat Friedman), and infrastructure—placing most MSL division heads under Alexandr Wang, dissolving the AGI Foundations unit, elevating FAIR, and naming Shengjia Zhao as chief scientist.

Amazon AGI Labs chief defends his reverse acqui-hire. He argues joining Amazon provides the talent and multi–billion-dollar compute clusters necessary to tackle remaining AGI research challenges that his startup couldn’t support.

Silicon Valley's AI deals are creating zombie startups: 'You hollowed out the organization'. These deals often hire founders and key researchers and buy limited tech rights, leaving companies understaffed, uncertain about their future, and operating as near-“zombie” firms while most upside accrues to founders and acquirers.

Anthropic bundles Claude Code into enterprise plans. The bundle lets businesses include Claude Code in enterprise suites with admin controls, granular spending limits, deeper integrations with Claude.ai and internal data, and tools for combined prompts and workflows.

Research

Deep Think with Confidence. A method that uses local token-level confidence to filter or early-stop low-quality reasoning traces during generation, cutting token use by up to ~85% while maintaining or improving accuracy across multiple reasoning benchmarks and LLMs.

Multi-head Transformers Provably Learn Symbolic Multi-step Reasoning via Gradient Descent. The authors prove that multi-head transformer architectures trained with gradient descent can learn algorithms for symbolic, multi-step reasoning under certain formal settings.

Concerns

Parents sue OpenAI over ChatGPT’s role in son’s suicide. The lawsuit alleges ChatGPT’s safeguards failed during prolonged conversations in which the teen evaded protections by framing suicidal inquiries as fiction, marking the first known wrongful-death claim against OpenAI tied to chatbot interactions.

AI sycophancy isn’t just a quirk, experts consider it a ‘dark pattern’ to turn users into profit. Experts warn that flattering, first-person responses and long, memory-rich sessions can encourage delusions, create emotional dependency, and act as a manipulative “dark pattern” that companies may keep for engagement and profit.

What counts as plagiarism? AI-generated papers pose new risks. Researchers caution that AI systems can produce papers reusing others’ methods or ideas without clear attribution—sometimes so closely that experts rate them as near-direct methodological overlap—raising questions about defining and detecting “idea plagiarism.”

In a first, Google has released data on how much energy an AI prompt uses. A new report estimates 0.24 watt-hours per median prompt and shows only 58% goes to TPUs, with the rest consumed by CPUs and memory (25%), idle backup machines (10%), and data-center overhead like cooling (8%).

Tesla loses bid to kill class action over misleading customers on self-driving capabilities for years. A judge ruled Tesla’s claims about hardware and software capable of full self-driving were sufficiently misleading to allow two subclasses of owners to pursue class-action claims seeking damages and an injunction.

Bank forced to rehire workers after lying about chatbot productivity, union says. According to the union, the bank falsely claimed a new voice chatbot reduced weekly call volumes, leading to wrongful redundancies for 45 long‑serving staff who have now been offered reinstatement or compensation after a tribunal found the roles weren’t redundant.

Policy

Anthropic has new rules for a more dangerous AI landscape. Anthropic tightened safety rules by explicitly banning help on high-yield explosives and CBRN weapons, adding prohibitions on cyberattacks and malware, and narrowing its political-content ban to deceptive or disruptive campaign activities, while clarifying requirements for high-risk use cases.

Analysis

Open weight LLMs exhibit inconsistent performance across providers. Benchmarks show the same open-weight model (gpt-oss-120b) varies widely—from 36.7% to 93.3% on a 2025 AIME run—depending on hosting provider, serving stack, and configuration (e.g., vLLM version and quantization), underscoring the need for standardization and conformance testing.

How WIRED Got Rolled by an AI Freelancer. WIRED published and later retracted a fabricated story after an AI-generated pitch and articles fooled editors and AI-detection tools, revealing gaps in contributor verification and editorial checks.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

谷歌 Gemini AI图像 Anthropic AI版权 Claude AI安全 AI研究 Google AI Images AI Copyright AI Safety AI Research
相关文章