热点
关于我们
xx
xx
"
欺骗检测
" 相关文章
Iterated Development and Study of Schemers (IDSS)
少点错误
2025-10-10T14:22:17.000000Z
Inverting the Most Forbidden Technique: What happens when we train LLMs to lie detectably?
少点错误
2025-10-09T01:33:06.000000Z
Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline
cs.AI updates on arXiv.org
2025-10-01T05:59:07.000000Z
Deceive, Detect, and Disclose: Large Language Models Play Mini-Mafia
cs.AI updates on arXiv.org
2025-09-30T04:00:57.000000Z
Towards mitigating information leakage when evaluating safety monitors
cs.AI updates on arXiv.org
2025-09-29T04:07:08.000000Z
Research Areas in Interpretability (The Alignment Project by UK AISI)
少点错误
2025-08-01T10:43:06.000000Z
Detecting Strategic Deception Using Linear Probes
少点错误
2025-02-06T15:51:44.000000Z
Finding Deception in Language Models
少点错误
2024-08-20T09:52:00.000000Z