https://www.seangoedecke.com/rss.xml 10月02日
OpenAI Responses API详解
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

OpenAI最近推出了Responses API,以替代原有的聊天完成API。新API引入了状态管理功能,允许在不每次传递整个对话历史的情况下维护对话状态。虽然OpenAI强调其性能和成本优势,但主要动机是解决其推理模型推理轨迹不公开的问题。传统API无法保留推理轨迹,导致推理模型(如GPT-5-Thinking)在第三方应用中表现受限。Responses API通过在后台维护推理轨迹,解决了这一限制,但OpenAI将其包装为更优的解决方案,引发争议。

🔍 OpenAI Responses API引入状态管理功能,允许在不每次传递整个对话历史的情况下维护对话状态,提升了API的灵活性和性能。

📊 尽管OpenAI强调性能和成本优势,但Responses API的主要动机是解决其推理模型推理轨迹不公开的问题,使得推理模型(如GPT-5-Thinking)在第三方应用中表现受限。

🤫 OpenAI通过Responses API在后台维护推理轨迹,解决了传统API无法保留推理轨迹的问题,但将其包装为更优的解决方案,引发争议。

🔒 传统API无法保留推理轨迹,导致推理模型在第三方应用中表现受限,而Responses API通过在后台维护推理轨迹,使得推理模型能够发挥全部能力。

🤔 OpenAI的营销策略引发了争议,因为传统API本身更简单,只是不适合用于隐藏推理轨迹的推理模型。

About six months ago, OpenAI released their Responses API, which replaced their previous /chat/completions API for inference. The old API was very simple: you pass in an array of messages representing a conversation between the model and a user, and get the model’s next response back. The new Responses API is more complicated. It has a lot more features, such as a set of built-in tools, but the main difference is it’s stateful. You don’t have to pass in the entire conversation history with each request. Instead, you can pass around an id representing the state of the conversation, and OpenAI (or your chosen provider) will keep it up-to-date for you.

OpenAI are selling the Responses API hard. Their docs emphasize the performance and cost benefits, and strongly imply that some agentic functionality can only be unlocked via the Responses API. There’s also been a recent Twitter thread from an OpenAI employee that’s almost begging people to use the Responses API.

When I first learned this, I was a bit confused about why anybody would need this. Despite OpenAI’s claims, there is nothing inherent about a stateful inference API that’s better than a normal /chat/completions stateless one. Prefix caching can be done just as easily in either case. Calling multiple tools in parallel can be done in either case. From a user’s perspective, it seems strictly easier to just use /chat/completions and manage the state myself. So why would anyone use the Responses API? The answer is actually very straightforward:

OpenAI still keeps their reasoning traces secret, so the old chat/completions API can’t preserve reasoning traces within the conversation for OpenAI models.

Most strong current models are reasoning models: before giving a final answer, they talk through the problem privately. If you use Claude, or DeepSeek, or Qwen, the API response contains this chain of thought verbatim. You can include it in the conversation history for /chat/completions to give the model more context about what it’s previously considered.

But you can’t do this with OpenAI’s GPT-5-Thinking, because OpenAI doesn’t expose the chain of thought1. It’s a tightly-guarded secret - presumably because OpenAI doesn’t want to guarantee that its chain-of-thought contents are safe, or because it contains private information that could leak implementation details to other labs. This is a big problem for OpenAI! It means that anyone writing code against OpenAI’s reasoning models (like GPT-5) won’t be able to pass around the chain-of-thought, and so GPT-5 will appear less capable than it would in OpenAI’s own products like ChatGPT.

Enter the Responses API. Since this API is stateful2, OpenAI (or Azure OpenAI) can maintain the chain-of-thought in their backend, plug it in to the conversation for you, and then strip it out before sending it back down to the client. Anyone can thus write a program that uses GPT-5 at its full power, so long as they use the new stateful API instead of the old /chat/completions stateless one.

I don’t think there’s necessarily anything wrong with OpenAI wanting to keep their chains-of-thought secret. But it is a bit underhanded to push the Responses API as a “faster, more flexible and easier” approach to AI inference, when what it really is is OpenAI’s attempt to work around their own awkward decision to conceal their reasoning traces.

I would have a lot more respect for OpenAI here if they simply said “our reasoning models don’t expose their reasoning traces, so if you try and use them with /chat/completions they’re going to kind of suck - here’s the workaround API we offer”3. I don’t like how the Responses API is being presented as a simpler, more flexible approach when it clearly isn’t. The old /chat/completions API is much simpler! It’s just not designed to be used with an inference provider that keeps secrets from the user.


  1. Gemini doesn’t either, as of fairly recently. I don’t really understand why Anthropic isn’t hammering this point - if I were them, I’d be shouting from the rooftops that they’re the only big Western AI lab that still works just fine with the stateless completions-style API.

  2. You don’t technically have to use the Responses API in a stateful way. If you really can’t allow OpenAI to store your customer’s messages, OpenAI will send the chain-of-thought to you as an encrypted message, so you can include it in the conversation without being able to read it yourself. This is how Gemini does it. I guess OpenAI thought it was too awkward to offer this interface as the main way to interact with their models.

  3. Maybe it’s obvious to everybody else! But I don’t see this point being made in other people’s writing on the Responses API, so I’m making it here in case anyone else is looking at the Responses API and scratching their head like I was.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

OpenAI Responses API 推理模型 推理轨迹 AI推理
相关文章