Groq降低GPT-OSS模型价格并推出Prompt Caching

At Groq, we’re relentless about fueling developers with the best price‑performance for AI inference. Today we’re starting to roll out two updates for GPT-OSS models that make building at scale faster, cheaper, and simpler.

New, Lower Prices for GPT‑OSS Models

We’ve reduced the price for gpt-oss models on GroqCloud to ensure all developers can ignite their applications with increased cost efficiency. These new prices are effective today and will apply retroactively to all unpaid invoices for the month of October 2025.

Prompt Caching on GPT-OSS Models

We’re rolling out prompt caching on our GPT-OSS models. Last month we quietly rolled out prompt caching on GPT-OSS-20B, and we’ll be rolling it out on GPT-OSS-120B over the next few weeks. What this means for developers using these models:

Up to 50% discount on cached tokens

Significantly lower latency

Rate limits go farther

Zero configuration

The 50 % discount on cached input tokens alongside the reduced pricing for GPT-OSS models improves cost efficiency for inference workloads.

Note: No extra fee for the caching feature itself. The discount only applies when a cache hit occurs.
Model	Input Price for uncached tokens (Per 1M Tokens)	Input Price for Cached Tokens (Per 1M Tokens)
openai/gpt-oss-120b	$0.15	$0.075
openai/gpt-oss-20b	$0.075	$0.0375

At Cluely, we specialize in real-time AI, where latency is critical. We already leverage Groq for our most time-sensitive generations, and implementing prompt caching will not only accelerate our product but also enable entirely new use cases. With an average of 92% prompt reuse across our generations. Implementing prompt caching will be game-changing for both speed and quality.

— Guilherme Garibaldi, Founder Engineer, Cluely

Why Prompt Caching Matters on GPT-OSS Models

Prompt caching makes AI workflows faster and lower cost. It’s ideal for any workflow with stable, reusable prompt components and works automatically on every API request. Here are a few example use cases well suited for GPT-OSS models that can benefit from prompt caching:

RAG platforms & data apps

Agentic applications:

Eval pipelines:

Chatbots:

How Prompt Caching Works

Prefix Matching:

Cache Hit:

Cache Miss:

Automatic Expiration:

Ready to take your GPT-OSS build to the next level?

With built-in tools, Responses API, and instant cloud availability in four global regions, Groq offers the most robust feature support for gpt-oss models. This alongside lower token costs and prompt caching support means developers now have even more fuel to power their applications.

Start experimenting with GPT-OSS models on GroqCloud today. To learn more about prompt caching and best practices check out our developer documentation.

New, Lower Prices for GPT‑OSS Models

Prompt Caching on GPT-OSS Models

Why Prompt Caching Matters on GPT-OSS Models

How Prompt Caching Works

Ready to take your GPT-OSS build to the next level?

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签