热点
"鲁棒性评估" 相关文章
Prompt Injection as an Emerging Threat: Evaluating the Resilience of Large Language Models
cs.AI updates on arXiv.org 2025-11-05T05:30:44.000000Z
MSC-Bench: A Rigorous Benchmark for Multi-Server Tool Orchestration
cs.AI updates on arXiv.org 2025-10-23T04:10:49.000000Z
NEBULA: Do We Evaluate Vision-Language-Action Agents Correctly?
cs.AI updates on arXiv.org 2025-10-21T04:22:36.000000Z
NEBULA: Do We Evaluate Vision-Language-Action Agents Correctly?
cs.AI updates on arXiv.org 2025-10-21T04:22:36.000000Z
FLUKE: A Linguistically-Driven and Task-Agnostic Framework for Robustness Evaluation
cs.AI updates on arXiv.org 2025-10-20T04:15:08.000000Z
Shallow Robustness, Deep Vulnerabilities: Multi-Turn Evaluation of Medical LLMs
cs.AI updates on arXiv.org 2025-10-15T05:00:02.000000Z
OpenAI、Anthropic、DeepMind联手发文:现有LLM安全防御不堪一击
36kr-科技 2025-10-14T10:09:30.000000Z
OpenAI、Anthropic、DeepMind联手发文:现有LLM安全防御不堪一击
机器之心 2025-10-14T06:54:22.000000Z
OpenAI、Anthropic、DeepMind联手发文:现有LLM安全防御不堪一击
机器之心 2025-10-14T06:54:22.000000Z
OpenAI、Anthropic、DeepMind联手发文:现有LLM安全防御不堪一击
机器之心 2025-10-14T06:53:03.000000Z
Memory Self-Regeneration: Uncovering Hidden Knowledge in Unlearned Models
cs.AI updates on arXiv.org 2025-10-07T04:12:30.000000Z
WAREX: Web Agent Reliability Evaluation on Existing Benchmarks
cs.AI updates on arXiv.org 2025-10-07T04:03:17.000000Z
MetaLogic: Robustness Evaluation of Text-to-Image Models via Logically Equivalent Prompts
cs.AI updates on arXiv.org 2025-10-02T04:18:23.000000Z
Multi-level Diagnosis and Evaluation for Robust Tabular Feature Engineering with Large Language Models
cs.AI updates on arXiv.org 2025-10-01T05:59:44.000000Z
Variance-Bounded Evaluation without Ground Truth: VB-Score
cs.AI updates on arXiv.org 2025-09-30T04:03:21.000000Z
Benchmarking and Improving LLM Robustness for Personalized Generation
cs.AI updates on arXiv.org 2025-09-25T05:36:44.000000Z
VisMoDAl: Visual Analytics for Evaluating and Improving Corruption Robustness of Vision-Language Models
cs.AI updates on arXiv.org 2025-09-19T04:37:50.000000Z
Geometric Red-Teaming for Robotic Manipulation
cs.AI updates on arXiv.org 2025-09-17T05:05:01.000000Z
GoldenTransformer: A Modular Fault Injection Framework for Transformer Robustness Research
cs.AI updates on arXiv.org 2025-09-16T05:18:17.000000Z
Pay Attention to Real World Perturbations! Natural Robustness Evaluation in Machine Reading Comprehension
cs.AI updates on arXiv.org 2025-09-11T15:51:52.000000Z