philschmid RSS feed 09月30日
OpenAI Codex CLI开发工具介绍
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

OpenAI Codex是一个开源的命令行界面(CLI)工具,与OpenAI的o3/o4-mini模型配合使用,实现“对话驱动开发”。它允许开发者通过终端直接调用AI模型执行编码任务。与简单聊天机器人不同,Codex能够读取和写入文件(通过补丁)、执行Shell命令(通常沙盒化),并根据结果和用户反馈进行迭代。其核心组件包括用户界面、代理循环、模型交互、命令执行(含沙盒)、文件补丁以及提示与上下文感知。通过系统提示、用户指令和对话历史,Codex提供丰富的上下文支持。工作流程包括初始化、API调用、工具调用处理(如读取文件、应用补丁)和用户交互,实现高效的编程辅助。

📚 OpenAI Codex是一款开源的命令行界面(CLI)工具,与OpenAI的o3/o4-mini模型配合使用,实现“对话驱动开发”,允许开发者通过终端直接调用AI模型执行编码任务。

🛠️ 它不仅支持基本的文本交互,还能读取和写入项目文件(通过补丁)、执行Shell命令(通常在沙盒环境中),并能根据命令执行结果和用户反馈进行迭代开发,极大地提升了编程效率。

🗣️ Codex通过系统提示、用户指令和对话历史,为AI模型提供丰富的上下文信息,使其能够理解项目背景和用户需求,从而生成更准确、更符合预期的代码和建议。

🔄 其工作流程包括初始化、API调用、工具调用处理(如读取文件、应用补丁)和用户交互,形成了一个完整的开发辅助闭环,让开发者能够更专注于创意和逻辑实现。

OpenAI Codex is a open source CLI released with OpenAI o3/o4-mini to be a "chat-driven development" tool. It allows developers to use AI models via API directly in their terminal to perform coding tasks. Unlike a simple chatbot, it can read files, write files (via patches), execute shell commands (often sandboxed), and iterate based on the results and user feedback.

Note: This overview was generated with Gemini 2.5 Pro and updated collaboratively iterated on with Gemini 2.5 Pro and myself.

Core Components & Workflow

User Interface (UI)

Agent Loop

    The core logic resides in src/utils/agent/agent-loop.ts.The AgentLoop class manages the interaction cycle with the OpenAI API.It takes the user's input, combines it with conversation history and instructions, and sends it to the model.It uses the openai Node.js library (v4+) and specifically calls openai.responses.create, indicating use of the /responses endpoint which supports streaming and tool use.

Model Interaction

    The AgentLoop sends the context (history, instructions, user input) to the specified model (default o4-mini, configurable via --model or config file).It requests a streaming response.It handles different response item types (message, function_call, function_call_output, reasoning).src/utils/model-utils.ts handles fetching available models and checking compatibility.
    The primary "tool" defined is shell (or container.exec), allowing the model to request shell command execution. See the tools array in src/utils/agent/agent-loop.ts.

Command Execution

Sandboxing

    The execution logic in handleExecCommand decides how to run the command based on the approval policy and safety assessment.full-auto mode implies sandboxing.src/utils/agent/sandbox/ contains the sandboxing implementations:
      macos-seatbelt.ts: Uses macOS's sandbox-exec to restrict file system access and block network calls (READ_ONLY_SEATBELT_POLICY). Writable paths are whitelisted.raw-exec.ts: Executes commands directly without sandboxing (used when sandboxing isn't needed or available).Linux: The README.md, Dockerfile, and scripts/ indicate a Docker-based approach. The CLI runs inside a minimal container where scripts/init_firewall.sh uses iptables/ipset to restrict network access only to the OpenAI API. The user's project directory is mounted into the container.

File Patching (apply_patch)

Prompts & Context Awareness

    System Prompt: A long, detailed system prompt is hardcoded as prefix within src/utils/agent/agent-loop.ts. This tells the model about its role as the Codex CLI, its capabilities (shell, patching), constraints (sandboxing), and coding guidelines.User Instructions: Instructions are gathered from both global (~/.codex/instructions.md) and project-specific (codex.md or similar, discovered via logic in src/utils/config.ts) files. These combined instructions are prepended to the conversation history sent to the model.Conversation History: The items array (containing ResponseItem objects like user messages, assistant messages, tool calls, tool outputs) is passed back to the model on each turn, providing conversational context. src/utils/approximate-tokens-used.ts estimates context window usage.File Context (Standard Mode): The agent doesn't automatically read project files. It gains file context only when the model explicitly requests to read a file (e.g., via cat) or when file content appears in the output of a previous command (e.g., git diff).File Context (Experimental --full-context Mode): This mode utilizes a distinct flow (see src/cli_singlepass.tsx, src/utils/singlepass/). It involves:Configuration: Stores default model, approval mode settings, etc. Managed by src/utils/config.ts, loads from ~/.codex/config.yaml (or .yml/.json) (not in repo).

Step-by-Step Manual Walkthrough (Simulating the CLI)

Let's imagine the user runs: codex "Refactor utils.ts to use arrow functions" in a directory /home/user/myproject.

    Initialization (cli.tsx, app.tsx):

      Parse arguments: Prompt is "Refactor...", model is default (o4-mini), approval mode is default (suggest).Load config (loadConfig in src/utils/config.ts): Read ~/.codex/config.yaml and ~/.codex/instructions.md.Discover and load project docs (loadProjectDoc in src/utils/config.ts): Find /home/user/myproject/codex.md and read its content.Combine instructions: Merge user instructions and project docs.Check Git status (checkInGit in src/utils/check-in-git.ts): Confirm /home/user/myproject is a Git repo.Render the main UI (TerminalChat).

    First API Call (AgentLoop.run in src/utils/agent/agent-loop.ts):

      Create initial input: [{ role: "user", content: [{ type: "input_text", text: "Refactor..." }] }].Construct API request payload: Include system prompt (from prefix), combined instructions, and the user input message. Set model: "o4-mini", stream: true, tools: [...]. No previous_response_id.Send request: Call openai.responses.create(...) (using the openai library). UI shows "Thinking...".

    Model Response (Stream):

      Assume the model decides it needs to read the file first.Stream event 1: response.output_item.done with item: { type: "function_call", name: "shell", arguments: '{"cmd": ["cat", "utils.ts"]}', call_id: "call_1" }Stream event 2: response.completed with output: [...] containing the same function call, id: "resp_1".Agent receives the function call. onLastResponseId is called with "resp_1".

    Tool Call Handling (handleExecCommand in src/utils/agent/handle-exec-command.ts):

      Parse arguments: cmd = ["cat", "utils.ts"].Check approval: canAutoApprove(["cat", "utils.ts"], "suggest", ["/home/user/myproject"]) (in src/approvals.ts) -> returns { type: "auto-approve", reason: "View file contents", group: "Reading files", runInSandbox: false }.Execute command (execCommand in src/utils/agent/handle-exec-command.ts): Run cat utils.ts directly (no sandbox needed for safe commands). Note: Assuming utils.ts exists at the root for this example; in reality, the model might need to specify a path like src/utils.ts.Simulate result: stdout = "/* content of utils.ts */", stderr = "", exitCode = 0.

    Second API Call (AgentLoop.run continues):

      Format tool result: Create a function_call_output item like { type: "function_call_output", call_id: "call_1", output: '{"output": "/* content ... */", "metadata": {"exit_code": 0, ...}}' }.Construct API request payload: Include system prompt, combined instructions, the entire history so far (user message, assistant function call request, function call output), set previous_response_id: "resp_1".Send request. UI shows "Thinking...".

    Model Response (Stream):

      Assume model generates the refactored code and decides to apply it.Stream event 1: response.output_item.done with item: { type: "function_call", name: "shell", arguments: '{"cmd": ["apply_patch", "*** Begin Patch\n*** Update File: utils.ts\n@@ ... -old +new ...\n*** End Patch"]}', call_id: "call_2" }.Stream event 2: response.completed with output: [...] containing the patch function call, id: "resp_2".Agent receives the patch function call. onLastResponseId is called with "resp_2".

    Tool Call Handling (Patch):

      Parse arguments: Identify apply_patch and extract the patch text.Check approval: canAutoApprove(["apply_patch", "..."], "suggest", ["/home/user/myproject"]). Since policy is suggest, this returns { type: "ask-user", applyPatch: { patch: "..." } }.Request confirmation (requestConfirmation): The UI (TerminalChatCommandReview) displays the patch diff and asks "Allow command? [y/N/e/a]".User reviews and presses 'y'. submitConfirmation is called with { decision: ReviewDecision.YES }.Execute patch (execApplyPatch in src/utils/agent/exec.ts -> process_patch in src/utils/agent/apply-patch.ts): Reads utils.ts, applies the diff logic, and writes the modified content back using Node.js fs.writeFileSync.Simulate result: stdout = "Done!", stderr = "", exitCode = 0.

    Third API Call:

      Format tool result: Create function_call_output item for the patch, { call_id: "call_2", output: '{"output": "Done!", ...}' }.Construct API request: Include history + patch result, previous_response_id: "resp_2".Send request.

    Model Response (Final):

      Assume model confirms the refactoring is done.Stream event 1: response.output_item.done with item: { type: "message", role: "assistant", content: [{ type: "output_text", text: "OK, I've refactored utils.ts to use arrow functions." }] }.Stream event 2: response.completed, id: "resp_3".Agent receives the message. onLastResponseId called with "resp_3".No more tool calls. The loop finishes for this turn. UI stops showing "Thinking...".

    User Interaction:

      The user sees the final message and the updated prompt, ready for the next command. The file utils.ts on their disk has been modified.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

OpenAI Codex 命令行工具 对话驱动开发 AI编程助手 开源项目
相关文章