谷歌持续优化Gemini大语言模型

Google continues to improve its Gemini family of large language models (LLMs) and its audio offshoot Gemini Live even between big numerical releases.

Case in point, this week, the company announced updates to Gemini 2.5 Flash and 2.5 Flash Lite, its LLMs designed for speed and efficiency, and for the application programming interface (API) to Gemini Live, its AI voice generation model for enterprise functions like customer support calls.

According to independent third-party analysis firm Artificial Analysis, Gemini 2.5 Flash Lite is now "the fastest proprietary model we have benchmarked on the Artificial Analysis website" at a speed of 887 output tokens per second — a 40% increase from the last version of the model and exceeding the multi-hundred per second token speeds of GPT-5 and Grok 4 Fast!

It still falls below the new K2 Think open source model from MBZUAI and G42 AI, with its impressive 2,000 output tokens per second, but it is incredibly fast.

Tokens are the numerical representations of word strings and clauses that an LLM uses, its native "language," and the number of tokens outputted per second is a good indicator of how quickly the model can deliver information out to users.

"Flash has long been our most popular model and put Gemini squarely on the map a year and a half ago, it's wild to see the progress continue non-stop," said Logan Kilpatrick, the product lead for Google's AI Studio and Gemini, posting on the social network X.

The new versions are now available via Google AI Studio and Vertex AI.

Here's what's changed for all three of these models and what it means for developers building AI applications atop them:

Performance and Capability Improvements

Both Gemini 2.5 Flash and Flash-Lite receive notable improvements in output quality and cost-efficiency, particularly in token usage and response speed.

Gemini 2.5 Flash shows improvements in agentic reasoning and tool use — capabilities crucial for handling multi-step and more autonomous workflows.

According to Google, this version scored 54% on the SWE-Bench Verified benchmark, up from 48.9% in the previous release.

The model also demonstrates enhanced cost-efficiency by generating higher-quality outputs using fewer tokens, leading to reduced latency and cost.

Gemini 2.5 Flash-Lite, meanwhile, focuses on reducing verbosity, improving instruction adherence, and strengthening its multimodal capabilities.

These upgrades result in a 50% reduction in output tokens compared to its earlier version, significantly lowering the cost of deployment in high-throughput applications.

It also now includes improved capabilities in image understanding, translation quality, and audio transcription.

Independent benchmarking by Artificial Analysis confirms these gains. The Gemini 2.5 Flash Preview 09-2025 model scored 54 in reasoning mode and 47 in non-reasoning mode on the Artificial Analysis Intelligence Index — increases of 3 and 8 points, respectively, over the previous stable release.

Flash-Lite Preview 09-2025 saw even greater improvements, with scores of 48 (reasoning) and 42 (non-reasoning), representing gains of 8 and 12 points.

As stated above, Flash-Lite is also approximately 40% faster than its July 2025 release.

The updated models are accessible through new aliases — gemini-flash-latest and gemini-flash-lite-latest — which allow developers to integrate the most recent versions without updating model string names manually. Google emphasizes that these previews are not intended to replace the current stable versions but to help shape future releases through testing and feedback.

Benchmark Results and External Validation

Third-party evaluations further reinforce the technical progress of the updated models. According to another third-party LLM evaluation firm, Vals AI, Gemini 2.5 Flash showed some of its largest improvements in benchmarks like TerminalBench (+5%), GPQA (+17.2%), and a proprietary CorpFin benchmark (+4.4%).

The model also placed third out of 38 on the MMMU benchmark and sixth out of 20 on SWE-Bench, while maintaining a cost that is roughly half that of similarly performing models.

While Flash and Flash-Lite perform similarly on many public benchmarks, Vals AI notes that Flash outperforms Flash-Lite by roughly 10% on private legal and financial benchmarks such as CaseLaw, TaxEval, and MortgageTax.

These results suggest that, despite the massive speed gains of Flash Lite, the regular 2.5 Flash may remain better suited for more complex reasoning tasks and enterprise-grade applications.

Yichao "Peak" Ji, Co-Founder and Chief Scientist at autonomous AI agent company Manus, reported a 15% performance increase on internal long-horizon agentic benchmarks, according to Google's blog post announcing the updates.

He noted that the cost-efficiency of the new Flash model enables scaling at levels that were previously impractical for their workloads.

Pricing and Access

Pricing for the updated models remains consistent with Google's value-focused positioning:

Gemini 2.5 Flash Preview 09-2025: $0.30 per 1M input tokens / $2.50 per 1M output tokens

Gemini 2.5 Flash-Lite Preview 09-2025: $0.10 per 1M input tokens / $0.40 per 1M output tokens

Developers can access these models using the new aliases (gemini-flash-latest, gemini-flash-lite-latest) in Vertex and Google AI Studio, which always point to the latest preview versions.

Google also confirmed that it will provide at least two weeks’ notice before any updates or deprecations to models behind these aliases.

For users prioritizing stability over early access to features, Google recommends continuing to use the existing stable versions: gemini-2.5-flash and gemini-2.5-flash-lite.

Expanded Gemini Live API Capabilities

Alongside the updates to Flash and Flash-Lite, Google also introduced an update to Gemini Live, its real-time, audio-first model designed for voice applications.

The new version includes native audio capabilities and enhancements to both function calling reliability and natural conversation handling.

The Live API now enables developers to build more responsive voice agents that can interact more seamlessly with users in dynamic, real-world settings. Improvements focus on two key areas:

More reliable function calling: The model is now significantly more accurate in identifying the correct function to call, knowing when not to call a function, and adhering to tool schemas.

Internal testing showed function call success rates doubled in single-call scenarios and increased 1.5x in multi-function contexts involving five to ten active calls. These improvements are especially important for voice use cases, where retrying failed function calls in real time is not an option.

More natural audio interactions: The model is now able to handle interruptions, background chatter, and natural pauses more gracefully. It can pause when the user is momentarily distracted and resume without losing context. This makes conversations feel more human-like and less brittle.

Google reports a significant drop in incorrect user interruptions and better handling of background conversations in its internal benchmarks.

In a practical deployment, Ava — an AI-powered family operations platform — has used the Live API to build a voice agent that acts as a digital household chief operating officer.

According to Ava’s CTO Joe Alicata in Google AI Studio's X post, the new model’s improvements in handling noisy inputs and delivering reliable function calls have accelerated development and deployment.

Next week, Google plans to extend the Live API’s functionality further by rolling out “thinking” capabilities, similar to those available in Gemini Flash and Pro models.

This will allow developers to set a thinking budget, giving the model additional time to process more complex queries before responding. During this delay, the model will provide a text summary of its thought process.

Developers can begin using the updated Gemini Live model today through the preview version named gemini-2.5-flash-native-audio-preview-09-2025. It supports real-time audio input and returns audio responses without the need for additional configuration.

A Continued Push Toward Developer-Centric AI

Google’s latest updates to the Gemini model family reflect a broader strategy: improving performance and usability through targeted updates informed by developer feedback.

With expanded capabilities, cost-efficient performance, and faster response times, Gemini Flash and Flash-Lite now offer developers more flexibility and performance headroom for a wide range of applications.

Whether used in agentic reasoning systems, real-time voice assistants, or high-throughput customer applications, the new preview versions mark another step in Google’s broader Gemini roadmap — with more updates planned in the months ahead.

Performance and Capability Improvements

Benchmark Results and External Validation

Pricing and Access

Expanded Gemini Live API Capabilities

A Continued Push Toward Developer-Centric AI

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签