cs.AI updates on arXiv.org 10月03日
微眼动启发的LLM行为检测方法
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文受微眼动启发,提出一种用于检测大型语言模型(LLM)潜在行为异常的方法。该方法通过轻量级位置编码扰动激发模型内部信号,无需微调或特定任务监督,即可检测包括事实性、安全性、毒性和后门攻击在内的多种场景下的模型失败。实验表明,该方法在多个最先进的LLM上有效且计算效率高。

arXiv:2510.01288v1 Announce Type: cross Abstract: We draw inspiration from microsaccades, tiny involuntary eye movements that reveal hidden dynamics of human perception, to propose an analogous probing method for large language models (LLMs). Just as microsaccades expose subtle but informative shifts in vision, we show that lightweight position encoding perturbations elicit latent signals that indicate model misbehaviour. Our method requires no fine-tuning or task-specific supervision, yet detects failures across diverse settings including factuality, safety, toxicity, and backdoor attacks. Experiments on multiple state-of-the-art LLMs demonstrate that these perturbation-based probes surface misbehaviours while remaining computationally efficient. These findings suggest that pretrained LLMs already encode the internal evidence needed to flag their own failures, and that microsaccade-inspired interventions provide a pathway for detecting and mitigating undesirable behaviours.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

微眼动 LLM 行为检测 大型语言模型 微调
相关文章