多语言推理模型在英语与非英语问题处理上的比较

cs.AI updates on arXiv.org 10月24日 12:49

多语言推理模型在英语与非英语问题处理上的比较

本文系统地比较了大型推理模型在处理英语和非英语问题时的推理能力，并分析了认知属性。研究发现，模型在英语推理上表现更好，但易受翻译错误影响。

arXiv:2510.20647v1 Announce Type: cross Abstract: Large Reasoning Models (LRMs) achieve strong performance on mathematical, scientific, and other question-answering tasks, but their multilingual reasoning abilities remain underexplored. When presented with non-English questions, LRMs often default to reasoning in English, raising concerns about interpretability and the handling of linguistic and cultural nuances. We systematically compare an LRM's reasoning in English versus the language of the question. Our evaluation spans two tasks: MGSM and GPQA Diamond. Beyond measuring answer accuracy, we also analyze cognitive attributes in the reasoning traces. We find that English reasoning traces exhibit a substantially higher presence of these cognitive behaviors, and that reasoning in English generally yields higher final-answer accuracy, with the performance gap increasing as tasks become more complex. However, this English-centric strategy is susceptible to a key failure mode - getting "Lost in Translation," where translation steps lead to errors that would have been avoided by question's language reasoning.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

大型推理模型多语言处理认知属性翻译错误数学问题

相关文章

Deepset-Mxbai-Embed-de-Large-v1 Released: A New Open Source German/English Embedding Model

Llama 3.1 405B VS Mistral Large 2，谁是开源之王？｜AI横评

周四直播｜ICML 2024，CMU&Meta 《语言模型物理学》系列，超越人类的二级推理，揭秘大语言模型推理机制

报道：OpenAI神秘“草莓”项目计划最早今年秋季推出！

OpenAI或最快今秋推出推理AI产品“草莓”

怎样评价哈尔滨工业大学出版社刘培杰数学工作室的丛书？

OpenAI o1智商120，还是被陶哲轩称为「平庸的研究生」，但实力究竟如何？

埃尔德什追忆乌拉姆：他是神童，也是神叟

132年未解开的李雅普诺夫函数谜题，被AI攻克了？

Fish Agent v0.1 3B Released: A Groundbreaking Voice-to-Voice Model Capable of Capturing and Generating Environmental Audio Information with Unprecedented Accuracy