LLMs辅助无障碍：评估与改进

cs.AI updates on arXiv.org 09月03日

LLMs辅助无障碍：评估与改进

本文提出一个基于人类验证的通用无障碍问题基准，用于评估大型语言模型在无障碍领域的覆盖范围和深度。研究发现，LLMs在视觉、听觉和行动能力方面的覆盖较好，但在言语、遗传/发育、感官认知和心理健康方面仍有不足。

arXiv:2509.00963v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly used for accessibility guidance, yet many disability groups remain underserved by their advice. To address this gap, we present taxonomy aligned benchmark1 of human validated, general purpose accessibility questions, designed to systematically audit inclusivity across disabilities. Our benchmark evaluates models along three dimensions: Question-Level Coverage (breadth within answers), Disability-Level Coverage (balance across nine disability categories), and Depth (specificity of support). Applying this framework to 17 proprietary and open-weight models reveals persistent inclusivity gaps: Vision, Hearing, and Mobility are frequently addressed, while Speech, Genetic/Developmental, Sensory-Cognitive, and Mental Health remain under served. Depth is similarly concentrated in a few categories but sparse elsewhere. These findings reveal who gets left behind in current LLM accessibility guidance and highlight actionable levers: taxonomy-aware prompting/training and evaluations that jointly audit breadth, balance, and depth.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLMs 无障碍评估改进基准

相关文章

2023年度深圳市建设工程招标投标及合同情况后评估服务招标公告

Are Emergent Behaviors in LLMs an Illusion? with Sanmi Koyejo - #671

AI for Accessibility with Wendy Chisholm - TWiML Talk #227

THRONE: Advancing the Evaluation of Hallucinations in Vision-Language Models

8 new accessibility updates across Lookout, Google Maps and more

科技助残，共享美好生活 | 宁波市庆祝第三十四次全国助残日暨2024助残月启动

CinePile: A Novel Dataset and Benchmark Specifically Designed for Authentic Long-Form Video Understanding

FinRobot: A Novel Open-Source AI Agent Platform Supporting Multiple Financially Specialized AI Agents Powered by LLMs

The AI paradox: Building creativity to protect against AI

Show HN: 让开发人员方便使用 LLM 的 CLI