热点
关于我们
xx
xx
"
高风险场景
" 相关文章
LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions
cs.AI updates on arXiv.org
2025-10-10T04:17:02.000000Z
LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions
cs.AI updates on arXiv.org
2025-10-10T04:17:02.000000Z
Trustworthy Summarization via Uncertainty Quantification and Risk Awareness in Large Language Models
cs.AI updates on arXiv.org
2025-10-03T04:10:57.000000Z
LLMs Don't Know Their Own Decision Boundaries: The Unreliability of Self-Generated Counterfactual Explanations
cs.AI updates on arXiv.org
2025-09-12T04:19:13.000000Z