MarkTechPost@AI 10月01日 17:11
模型上下文协议(MCP)在生成式AI安全中的作用
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

模型上下文协议(MCP)是一种开放的JSON-RPC标准,它规范了AI客户端如何通过定义好的传输方式连接到暴露工具、资源和提示的服务器。MCP通过使代理/工具交互明确化和可审计,为安全工作带来了价值,并对授权有规范性要求,便于团队在代码和测试中进行验证。这使得工具使用的影响范围得到严格控制,可以在明确的信任边界上进行可重复的红队演练,并实现可衡量的策略执行。文章探讨了MCP如何标准化AI组件交互,提供规范的授权控制,并在实际安全工程中发挥作用,包括明确信任边界、实现最小权限原则和创建确定性的红队演练攻击面。同时,文章还通过一个恶意MCP服务器的案例,强调了对MCP服务器进行审查和版本固定的重要性,并提供了一个实施安全加固的清单,以及当前可用于测试的MCP应用案例。

⚖️ **规范化AI组件交互与安全控制**:MCP通过定义AI客户端与服务器之间工具、资源和提示的交互方式,将代理和工具的使用变得明确且可审计。它强制执行规范的授权控制,例如禁止令牌直通和强制执行受众绑定与验证,从而确保服务器作为独立的、拥有自身凭证和日志的一等公民,有效防止了“被 the confused deputy”攻击路径,并保留了上游审计和限制控制。

🛡️ **增强实际安全工程实践**:MCP为安全工程师提供了多个实践切入点。它建立了清晰的客户端-服务器信任边界,允许在边缘附加同意UI、范围限定提示和结构化日志记录。这有助于实现最小权限原则,例如,一个秘密代理服务器可以颁发短期的、仅暴露受限工具的凭证,而不是将广泛的Vault令牌交给模型。此外,MCP的类型化工具模式和可重放传输为红队提供了确定性的攻击面,能够进行可复现的测试。

🚨 **警惕MCP服务器的安全风险与加固措施**:文章通过一个恶意MCP服务器(trojanized postmark-mcp npm包)的案例,揭示了MCP服务器可能存在的安全威胁,该服务器曾秘密地将通过它发送的所有电子邮件以密送(BCC)方式泄露给攻击者。这强调了对MCP服务器进行安全审查、版本固定和供应链审查的重要性。建议包括维护批准服务器的白名单、固定版本/哈希,要求代码来源证明,以及监控异常的网络出口模式和定期进行凭证轮换。

🚀 **MCP在当前AI安全框架中的应用与价值**:MCP与NIST的AI RMF和OWASP的LLM Top-10等安全框架在访问控制、日志记录和红队评估等方面保持一致。文章列举了Anthropic Claude、Google Data Commons MCP和Delinea MCP等当前已采用MCP的实例,表明MCP已成为构建安全代理系统和进行可靠红队评估的实用基础,能够有效约束代理能力、观察其行为并可靠地重放对抗性场景。

Overview

Model Context Protocol (MCP) is an open, JSON-RPC–based standard that formalizes how AI clients (assistants, IDEs, web apps) connect to servers exposing three primitives—tools, resources, and prompts—over defined transports (primarily stdio for local and Streamable HTTP for remote). MCP’s value for security work is that it renders agent/tool interactions explicit and auditable, with normative requirements around authorization that teams can verify in code and in tests. In practice, this enables tight blast-radius control for tool use, repeatable red-team scenarios at clear trust boundaries, and measurable policy enforcement—provided organizations treat MCP servers as privileged connectors subject to supply-chain scrutiny.

What MCP standardizes?

An MCP server publishes: (1) tools (schema-typed actions callable by the model), (2) resources (readable data objects the client can fetch and inject as context), and (3) prompts (reusable, parameterized message templates, typically user-initiated). Distinguishing these surfaces clarifies who is “in control” at each edge: model-driven for tools, application-driven for resources, and user-driven for prompts. Those roles matter in threat modeling, e.g., prompt injection often targets model-controlled paths, while unsafe output handling often occurs at application-controlled joins.

Transports. The spec defines two standard transports—stdio (Standard Input/Output) and Streamable HTTP—and leaves room for pluggable alternatives. Local stdio reduces network exposure; Streamable HTTP fits multi-client or web deployments and supports resumable streams. Treat the transport choice as a security control: constrain network egress for local servers, and apply standard web authN/Z and logging for remote ones.

Client/server lifecycle and discovery. MCP formalizes how clients discover server capabilities (tools/resources/prompts), negotiate sessions, and exchange messages. That uniformity is what lets security teams instrument call flows, capture structured logs, and assert pre/postconditions without bespoke adapters per integration.

Normative authorization controls

The Authorization approach is unusually prescriptive for an integration protocol and should be enforced as follows:

This is the core of MCP’s security structure: model-side capabilities are powerful, but the protocol insists that servers be first-class principals with their own credentials, scopes, and logs—rather than opaque pass-throughs for a user’s global token.

Where MCP supports security engineering in practice?

Clear trust boundaries. The clientserver edge is an explicit, inspectable boundary. You can attach consent UIs, scope prompts, and structured logging at that edge. Many client implementations present permission prompts that enumerate a server’s tools/resources before enabling them—useful for least-privilege and audit—even though UX is not specified by the standard.

Containment and least privilege. Because a server is a separate principal, you can enforce minimal upstream scopes. For example, a secrets-broker server can mint short-lived credentials and expose only constrained tools (e.g., “fetch secret by policy label”), rather than handing broad vault tokens to the model. Public MCP servers from security vendors illustrate this model.

Deterministic attack surfaces for red teaming. With typed tool schemas and replayable transports, red teams can build fixtures that simulate adversarial inputs at tool boundaries and verify post-conditions across models/clients. This yields reproducible tests for classes of failures like prompt injection, insecure output handling, and supply-chain abuse. Pair those tests with recognized taxonomies.

Case study: the first malicious MCP server

In late September 2025, researchers disclosed a trojanized postmark-mcp npm package that impersonated a Postmark email MCP server. Beginning with v1.0.16, the malicious build silently BCC-exfiltrated every email sent through it to an attacker-controlled address/domain. The package was subsequently removed, but guidance urged uninstalling the affected version and rotating credentials. This appears to be the first publicly documented malicious MCP server in the wild, and it underscores that MCP servers often run with high trust and should be vetted and version-pinned like any privileged connector.

Operational takeaways:

These are not theoretical controls; the incident impact flowed directly from over-trusted server code in a routine developer workflow.

Using MCP to structure red-team exercises

1) Prompt-injection and unsafe-output drills at the tool boundary. Build adversarial corpora that enter via resources (application-controlled context) and attempt to coerce calls to dangerous tools. Assert that the client sanitizes injected outputs and that server post-conditions (e.g., allowed hostnames, file paths) hold. Map findings to LLM01 (Prompt Injection) and LLM02 (Insecure Output Handling).

2) Confused-deputy probes for token misuse. Craft tasks that try to induce a server to use a client-issued token or to call an unintended upstream audience. A compliant server must reject foreign-audience tokens per the authorization spec; clients must request audience-correct tokens with RFC 8707 resource. Treat any success here as a P1.

3) Session/stream resilience. For remote transports, exercise reconnection/resumption flows and multi-client concurrency for session fixation/hijack risks. Validate non-deterministic session IDs and rapid expiry/rotation in load-balanced deployments. (Streamable HTTP supports resumable connections; use it to stress your session model.)

4) Supply-chain kill-chain drills. In a lab, insert a trojaned server (with benign markers) and verify whether your allowlists, signature checks, and egress detection catch it—mirroring the Postmark incident TTPs. Measure time to detection and credential rotation MTTR.

5) Baseline with trusted public servers. Use vetted servers to construct deterministic tasks. Two practical examples: Google’s Data Commons MCP exposes public datasets under a stable schema (good for fact-based tasks/replays), and Delinea’s MCP demonstrates least-privilege secrets brokering for agent workflows. These are ideal substrates for repeatable jailbreak and policy-enforcement tests.

Implementation-Focused Security Hardening Checklist

Client side

Server side

Detection & response

Governance alignment

MCP’s separation of concerns—clients as orchestrators, servers as scoped principals with typed capabilities—aligns directly with NIST’s AI RMF guidance for access control, logging, and red-team evaluation of generative systems, and with OWASP’s LLM Top-10 emphasis on mitigating prompt injection, unsafe output handling, and supply-chain vulnerabilities. Use those frameworks to justify controls in security reviews and to anchor acceptance criteria for MCP integrations.

Current adoption you can test against

Summary

MCP is not a silver-bullet “security product.” It is a protocol that gives security and red-team practitioners stable, enforceable levers: audience-bound tokens, explicit clientserver boundaries, typed tool schemas, and transports you can instrument. Use those levers to (1) constrain what agents can do, (2) observe what they actually did, and (3) replay adversarial scenarios reliably. Treat MCP servers as privileged connectors—vet, pin, and monitor them—because adversaries already do. With those practices in place, MCP becomes a practical foundation for secure agentic systems and a reliable substrate for red-team evaluation.


Resources used in the article

MCP specification & concepts

MCP ecosystem (official)

Security frameworks

Incident: malicious postmark-mcp server

Example MCP servers referenced

The post The Role of Model Context Protocol (MCP) in Generative AI Security and Red Teaming appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

MCP Model Context Protocol AI Security Red Teaming Generative AI Authorization Supply Chain Security NIST AI RMF OWASP LLM Top 10
相关文章