MarkTechPost@AI 前天 01:07
Qualifire AI 开源 Rogue:AI 代理的端到端测试框架
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Qualifire AI 推出了 Rogue,一个开源的 Python 框架,用于评估 AI 代理。传统的测试方法难以发现多轮交互中的漏洞,且审计追踪薄弱。Rogue 能够将业务策略转化为可执行的场景,通过协议准确的多轮对话来测试目标代理,并生成适用于 CI/CD 和合规性审查的确定性报告。该框架支持多种客户端接口,包括 TUI、Web UI 和 CLI,方便开发者进行全面的测试、合规性验证和回归监控,从而在发布前确保 AI 代理的性能、合规性和可靠性。

🛡️ Rogue 框架旨在解决传统 QA 方法在评估 AI 代理多轮交互漏洞方面的不足。它通过将业务策略转化为可执行场景,并利用协议准确的对话来测试代理,从而提供更强大的审计追踪和更可靠的评估结果。

🚀 该框架提供多种客户端接口,包括 TUI、Web UI 和 CLI,方便开发者根据不同需求进行交互式测试或自动化集成。CLI 模式特别适合 CI/CD 流水线,能够实现自动化评估和合规性审查。

📋 Rogue 支持将业务策略显式地转化为可执行的测试场景,并生成机器可读的证据,从而使开发团队能够自信地进行版本发布。它能够验证 PII/PHI 处理、拒绝行为、秘密泄露预防等安全与合规性要求。

📈 Rogue 框架能够对 AI 代理进行回归和漂移监控,通过夜间运行测试套件来检测新模型版本或提示更改带来的行为漂移,并在发布前强制执行策略关键的通过标准。

⚙️ Rogue 的核心是一个客户端-服务器架构,其中服务器负责核心评估逻辑,而多个客户端(TUI、Web UI、CLI)则连接到服务器以提供不同的用户界面和交互方式,这使得框架部署和使用更加灵活。

Agentic systems are stochastic, context-dependent, and policy-bounded. Conventional QA—unit tests, static prompts, or scalar “LLM-as-a-judge” scores—fails to expose multi-turn vulnerabilities and provides weak audit trails. Developer teams need protocol-accurate conversations, explicit policy checks, and machine-readable evidence that can gate releases with confidence.

Qualifire AI has open-sourced Rogue, a Python framework that evaluates AI agents over the Agent-to-Agent (A2A) protocol. Rogue converts business policies into executable scenarios, drives multi-turn interactions against a target agent, and outputs deterministic reports suitable for CI/CD and compliance reviews.

Quick Start

Prerequisites

Installation

Option 1: Quick Install (Recommended)

Use our automated install script to get up and running quickly:

# TUIuvx rogue-ai# Web UIuvx rogue-ai ui# CLI / CI/CDuvx rogue-ai cli

Option 2: Manual Installation

(a) Clone the repository:

    git clone https://github.com/qualifire-dev/rogue.gitcd rogue

    (b) Install dependencies:

      If you are using uv:

      uv sync

      Or, if you are using pip:

      pip install -e .

      (c) OPTIONALLY: Set up your environment variables: Create a .env file in the root directory and add your API keys. Rogue uses LiteLLM, so you can set keys for various providers.

      OPENAI_API_KEY="sk-..."ANTHROPIC_API_KEY="sk-..."GOOGLE_API_KEY="..."

      Running Rogue

      Rogue operates on a client-server architecture where the core evaluation logic runs in a backend server, and various clients connect to it for different interfaces.

      Default Behavior

      When you run uvx rogue-ai without any mode specified, it:

        Starts the Rogue server in the backgroundLaunches the TUI (Terminal User Interface) client
      uvx rogue-ai

      Available Modes

      Mode Arguments

      Server Mode
      uvx rogue-ai server [OPTIONS]

      Options:

      TUI Mode

      uvx rogue-ai tui [OPTIONS]Web UI Modeuvx rogue-ai ui [OPTIONS]

      Options:

      Example: Testing the T-Shirt Store Agent

      This repository includes a simple example agent that sells T-shirts. You can use it to see Rogue in action.

      Install example dependencies:

      If you are using uv:

       uv sync --group examples

      or, if you are using pip:

      pip install -e .[examples]

      (a) Start the example agent server in a separate terminal:

        If you are using uv:

        uv run examples/tshirt_store_agent

        If not:

        python examples/tshirt_store_agent

        This will start the agent on http://localhost:10001.

        (b) Configure Rogue in the UI to point to the example agent:

          (c) Run the evaluation and watch Rogue test the T-Shirt agent’s policies!

            You can use either the TUI (uvx rogue-ai) or Web UI (uvx rogue-ai ui) mode.

            Where Rogue Fits: Practical Use Cases

            What Exactly Is Rogue—and Why Should Agent Dev Teams Care?

            Rogue is an end-to-end testing framework designed to evaluate the performance, compliance, and reliability of AI agents. Rogue synthesizes business context and risk into structured tests with clear objectives, tactics and success criteria. The EvaluatorAgent runs protocol correct conversations in fast single turn or deep multi turn adversarial modes. Bring your own model, or let Rogue use Qualifire’s bespoke SLM judges to drive the tests. Streaming observability and deterministic artifacts: live transcripts,pass/fail verdicts, rationales tied to transcript spans, timing and model/version lineage.

            Under the Hood: How Rogue Is Built

            Rogue operates on a client-server architecture:

            This architecture allows for flexible deployment and usage patterns, where the server can run independently and multiple clients can connect to it simultaneously.

            Summary

            Rogue helps developer teams test agent behavior the way it actually runs in production. It turns written policies into concrete scenarios, exercises those scenarios over A2A, and records what happened with transcripts you can audit. The result is a clear, repeatable signal you can use in CI/CD to catch policy breaks and regressions before they ship.

            Find Rogue on GitHub

            The post Qualifire AI Open-Sources Rogue: An End-to-End Agentic AI Testing Framework Designed to Evaluate the Performance, Compliance, and Reliability of AI Agents appeared first on MarkTechPost.

            Fish AI Reader

            Fish AI Reader

            AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

            FishAI

            FishAI

            鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

            联系邮箱 441953276@qq.com

            相关标签

            AI代理测试 Rogue Qualifire AI 开源框架 AI合规性 AI可靠性 CI/CD Agentic Systems Testing Open Source Framework AI Compliance AI Reliability
            相关文章