安全训练_Fishai

热点

"安全训练" 相关文章

[CS 2881r] Can We Prompt Our Way to Safety? Comparing System Prompt Styles and Post-Training Effects on Safety Benchmarks

少点错误 2025-10-28T07:07:49.000000Z

Collapse of Irrelevant Representations (CIR) Ensures Robust and Non-Disruptive LLM Unlearning

cs.AI updates on arXiv.org 2025-09-16T05:43:18.000000Z

From Hard Refusals to Safe-Completions: Toward Output-Centric Safety Training

cs.AI updates on arXiv.org 2025-08-14T04:19:07.000000Z

New Anthropic study shows AI really doesn’t want to be forced to change its views

TechCrunch News 2024-12-18T22:19:20.000000Z

Current safety training techniques do not fully transfer to the agent setting

少点错误 2024-11-03T19:38:15.000000Z

Evaluating the Vulnerabilities of Unlearning Techniques in Large Language Models: A Comprehensive White-Box Analysis

MarkTechPost@AI 2024-10-03T07:21:38.000000Z

OpenAI最强模型o1，仍分不出“9.11和9.8哪个大”

虎嗅 2024-09-13T03:38:23.000000Z

OpenAI 发布最强模型 o1，打破 AI 瓶颈开启新时代，GPT-5 可能永远不会来了

36kr 2024-09-13T02:04:08.000000Z

Iterative Refinement Stages of Lying in LLMs

少点错误 2024-08-22T09:06:58.000000Z

Copyright © 2019 FISHAI.All Rights Reserved