LLM代理的权力控制与风险规避

cs.AI updates on arXiv.org 09月30日

LLM代理的权力控制与风险规避

本文提出了一种直接测量和控制基于大型语言模型（LLM）的AI系统代理性的方法，旨在应对LLM代理可能带来的潜在危害。通过将代理性视为独立于智能度量的系统属性，并参考跨学科文献，提出代理性的系统属性操作维度，包括偏好刚性、独立操作和目标持续性。同时，提出一种表征工程方法来测量和控制LLM代理的代理性，并基于此提供了一系列监管工具，如强制测试协议、特定领域的代理性限制、基于代理性的风险评估框架以及防止社会级风险的代理性上限。

arXiv:2509.22735v1 Announce Type: cross Abstract: As increasingly capable large language model (LLM)-based agents are developed, the potential harms caused by misalignment and loss of control grow correspondingly severe. To address these risks, we propose an approach that directly measures and controls the agency of these AI systems. We conceptualize the agency of LLM-based agents as a property independent of intelligence-related measures and consistent with the interdisciplinary literature on the concept of agency. We offer (1) agency as a system property operationalized along the dimensions of preference rigidity, independent operation, and goal persistence, (2) a representation engineering approach to the measurement and control of the agency of an LLM-based agent, and (3) regulatory tools enabled by this approach: mandated testing protocols, domain-specific agency limits, insurance frameworks that price risk based on agency, and agency ceilings to prevent societal-scale risks. We view our approach as a step toward reducing the risks that motivate the ``Scientist AI'' paradigm, while still capturing some of the benefits from limited agentic behavior.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

大型语言模型代理性控制风险规避表征工程监管工具

相关文章

Is Claude 3 Outperforming GPT-4?

Harmonizing AI: Crafting Personalized Song Suggestions

AI News Weekly - Issue #377: Next in AI : Pioneers' Predictions! - Mar 21st 2024

COLLAGE: A New Machine Learning Approach to Deal with Floating-Point Errors in Low-Precision to Make LLM Training Accurate and Efficient

Leveraging Linguistic Expertise in NLP: A Deep Dive into RELIES and Its Impact on Large Language Models

Japanese Researchers Release “Fugaku-LLM” Trained on the Fugaku Supercomputer

Teaching Large Language Models to Reason with Reinforcement Learning with Alex Havrilla - #680

Deep Learning, Transformers, and the Consequences of Scale with Oriol Vinyals - #546

AI Gateway Provider Portkey.ai Is In Partnership With F5

Powering AI with the World's Largest Computer Chip with Joel Hestness - #684