Qualifire AI 开源 Rogue：AI 代理的端到端测试框架

MarkTechPost@AI 前天 01:07

Qualifire AI 推出了 Rogue，一个开源的 Python 框架，用于评估 AI 代理。传统的测试方法难以发现多轮交互中的漏洞，且审计追踪薄弱。Rogue 能够将业务策略转化为可执行的场景，通过协议准确的多轮对话来测试目标代理，并生成适用于 CI/CD 和合规性审查的确定性报告。该框架支持多种客户端接口，包括 TUI、Web UI 和 CLI，方便开发者进行全面的测试、合规性验证和回归监控，从而在发布前确保 AI 代理的性能、合规性和可靠性。

🛡️ Rogue 框架旨在解决传统 QA 方法在评估 AI 代理多轮交互漏洞方面的不足。它通过将业务策略转化为可执行场景，并利用协议准确的对话来测试代理，从而提供更强大的审计追踪和更可靠的评估结果。

🚀 该框架提供多种客户端接口，包括 TUI、Web UI 和 CLI，方便开发者根据不同需求进行交互式测试或自动化集成。CLI 模式特别适合 CI/CD 流水线，能够实现自动化评估和合规性审查。

📋 Rogue 支持将业务策略显式地转化为可执行的测试场景，并生成机器可读的证据，从而使开发团队能够自信地进行版本发布。它能够验证 PII/PHI 处理、拒绝行为、秘密泄露预防等安全与合规性要求。

📈 Rogue 框架能够对 AI 代理进行回归和漂移监控，通过夜间运行测试套件来检测新模型版本或提示更改带来的行为漂移，并在发布前强制执行策略关键的通过标准。

⚙️ Rogue 的核心是一个客户端-服务器架构，其中服务器负责核心评估逻辑，而多个客户端（TUI、Web UI、CLI）则连接到服务器以提供不同的用户界面和交互方式，这使得框架部署和使用更加灵活。

Agentic systems are stochastic, context-dependent, and policy-bounded. Conventional QA—unit tests, static prompts, or scalar “LLM-as-a-judge” scores—fails to expose multi-turn vulnerabilities and provides weak audit trails. Developer teams need protocol-accurate conversations, explicit policy checks, and machine-readable evidence that can gate releases with confidence.

Qualifire AI has open-sourced Rogue, a Python framework that evaluates AI agents over the Agent-to-Agent (A2A) protocol. Rogue converts business policies into executable scenarios, drives multi-turn interactions against a target agent, and outputs deterministic reports suitable for CI/CD and compliance reviews.

Quick Start

Prerequisites

uv installation guide

Installation

Option 1: Quick Install (Recommended)

Use our automated install script to get up and running quickly:

Copy CodeCopiedUse a different Browser

# TUIuvx rogue-ai# Web UIuvx rogue-ai ui# CLI / CI/CDuvx rogue-ai cli

Option 2: Manual Installation

(a) Clone the repository:

Copy CodeCopiedUse a different Browser

git clone https://github.com/qualifire-dev/rogue.gitcd rogue

(b) Install dependencies:

If you are using uv:

Copy CodeCopiedUse a different Browser

uv sync

Or, if you are using pip:

Copy CodeCopiedUse a different Browser

pip install -e .

(c) OPTIONALLY: Set up your environment variables: Create a .env file in the root directory and add your API keys. Rogue uses LiteLLM, so you can set keys for various providers.

Copy CodeCopiedUse a different Browser

OPENAI_API_KEY="sk-..."ANTHROPIC_API_KEY="sk-..."GOOGLE_API_KEY="..."

Running Rogue

Rogue operates on a client-server architecture where the core evaluation logic runs in a backend server, and various clients connect to it for different interfaces.

Default Behavior

When you run uvx rogue-ai without any mode specified, it:

Starts the Rogue server in the backgroundLaunches the TUI (Terminal User Interface) client

Copy CodeCopiedUse a different Browser

uvx rogue-ai

Available Modes

Default (Server + TUI)

Server

TUI

Web UI:

CLI

Mode Arguments

Server Mode

Copy CodeCopiedUse a different Browser

uvx rogue-ai server [OPTIONS]

Options:

–host HOST – Host to run the server on (default: 127.0.0.1 or HOST env var)–port PORT – Port to run the server on (default: 8000 or PORT env var)–debug – Enable debug logging

TUI Mode

Copy CodeCopiedUse a different Browser

uvx rogue-ai tui [OPTIONS]Web UI Modeuvx rogue-ai ui [OPTIONS]

Options:

http://localhost:8000

Example: Testing the T-Shirt Store Agent

This repository includes a simple example agent that sells T-shirts. You can use it to see Rogue in action.

Install example dependencies:

If you are using uv:

Copy CodeCopiedUse a different Browser

 uv sync --group examples

or, if you are using pip:

Copy CodeCopiedUse a different Browser

pip install -e .[examples]

(a) Start the example agent server in a separate terminal:

If you are using uv:

Copy CodeCopiedUse a different Browser

uv run examples/tshirt_store_agent

If not:

Copy CodeCopiedUse a different Browser

python examples/tshirt_store_agent

This will start the agent on http://localhost:10001.

(b) Configure Rogue in the UI to point to the example agent:

Agent URL: http://localhost:10001Authentication: no-auth

(c) Run the evaluation and watch Rogue test the T-Shirt agent’s policies!

You can use either the TUI (uvx rogue-ai) or Web UI (uvx rogue-ai ui) mode.

Where Rogue Fits: Practical Use Cases

Safety & Compliance Hardening

E-Commerce & Support Agents

Developer/DevOps Agents

Multi-Agent Systems

Regression & Drift Monitoring

What Exactly Is Rogue—and Why Should Agent Dev Teams Care?

Rogue is an end-to-end testing framework designed to evaluate the performance, compliance, and reliability of AI agents. Rogue synthesizes business context and risk into structured tests with clear objectives, tactics and success criteria. The EvaluatorAgent runs protocol correct conversations in fast single turn or deep multi turn adversarial modes. Bring your own model, or let Rogue use Qualifire’s bespoke SLM judges to drive the tests. Streaming observability and deterministic artifacts: live transcripts,pass/fail verdicts, rationales tied to transcript spans, timing and model/version lineage.

Under the Hood: How Rogue Is Built

Rogue operates on a client-server architecture:

Rogue Server:

Client Interfaces

TUI

Web UI

CLI

This architecture allows for flexible deployment and usage patterns, where the server can run independently and multiple clients can connect to it simultaneously.

Summary

Rogue helps developer teams test agent behavior the way it actually runs in production. It turns written policies into concrete scenarios, exercises those scenarios over A2A, and records what happened with transcripts you can audit. The result is a clear, repeatable signal you can use in CI/CD to catch policy breaks and regressions before they ship.

Find Rogue on GitHub

The post Qualifire AI Open-Sources Rogue: An End-to-End Agentic AI Testing Framework Designed to Evaluate the Performance, Compliance, and Reliability of AI Agents appeared first on MarkTechPost.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。