通过增强预训练数据提升大模型的可信度与安全性

cs.AI updates on arXiv.org 10月22日 12:17

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

大型语言模型（LLMs）在幻觉和可信度方面面临全球性挑战，行业正积极应对。尽管已有许多后训练和推理技术的进步，但LLMs的不安全和幻觉问题根源在于预训练数据和学习机制。本文提出通过增强预训练数据来提升LLMs的可信度和安全性。鉴于数据量庞大，难以完全清除错误和偏见，且预训练数据缺乏现实世界知识的 grounding。为此，研究引入“带世界上下文的数据”（DWC），旨在通过整合现实世界上下文信息，使预训练数据更好地锚定于真实场景，从而降低模型训练的不确定性，提升其安全性和可信度。实验表明，使用1.5万亿DWC tokens继续预训练JT-35B-Base模型，并在后训练阶段激活DWC潜力后，JT-Safe-35B在安全与可信度评估基准上平均性能提升1.79%，且仅使用了6.2万亿tokens进行预训练。

💡 **数据增强是关键：** 文章的核心在于通过优化预训练数据来解决大语言模型（LLMs）普遍存在的幻觉和可信度问题，这被认为是模型不安全性的根本原因。研究提出了一种名为“带世界上下文的数据”（DWC）的方法，旨在通过整合现实世界的信息来丰富预训练数据。

🌍 **引入世界上下文：** DWC方法的核心在于为预训练数据增加其在现实世界中的时空上下文。这意味着数据不再仅仅是孤立的token序列，而是被视为代表了真实世界某个部分的信息，从而帮助模型更好地理解和应用知识，减少不确定性。

🚀 **实证效果显著：** 通过在JT-35B-Base模型上应用DWC并进行后续训练，新模型JT-Safe-35B在安全与可信度评估基准上取得了平均1.79%的性能提升。值得注意的是，这一提升是在预训练数据量相对可控（6.2万亿tokens）的情况下实现的，显示了DWC方法的有效性和效率。

🎯 **应对数据局限：** 传统预训练数据量巨大，难以完全消除事实错误、逻辑不一致或分布偏差。DWC方法提供了一种有效的途径来弥补这些不足，使模型在学习过程中能够更准确地把握信息，从而输出更可靠、更安全的内容。

arXiv:2510.17918v1 Announce Type: cross Abstract: The hallucination and credibility concerns of large language models (LLMs) are global challenges that the industry is collectively addressing. Recently, a significant amount of advances have been made on post-training and inference techniques to mitigate these challenges. However, it is widely agreed that unsafe and hallucinations of LLMs intrinsically originate from pre-training, involving pre-training data and the next-token prediction learning mechanism. In this paper, we focus on enhancing pre-training data to improve the trustworthiness and safety of LLMs. Since the data is vast, it's almost impossible to entirely purge the data of factual errors, logical inconsistencies, or distributional biases. Moreover, the pre-training data lack grounding in real-world knowledge. Each piece of data is treated as a sequence of tokens rather than as a representation of a part of the world. To overcome these issues, we propose approaches to enhancing our pre-training data with its context in the world and increasing a substantial amount of data reflecting industrial scenarios. We argue that most source data are created by the authors for specific purposes in a certain spatial-temporal context. They have played a role in the real world. By incorporating related world context information, we aim to better anchor pre-training data within real-world scenarios, thereby reducing uncertainty in model training and enhancing the model's safety and trustworthiness. We refer to our Data with World Context as DWC. We continue pre-training an earlier checkpoint of JT-35B-Base with 1.5 trillion of DWC tokens. We introduce our post-training procedures to activate the potentials of DWC. Compared with the Qwen model of a similar scale, JT-Safe-35B achieves an average performance improvement of 1.79% on the Safety and Trustworthy evaluation benchmarks, while being pretrained with only 6.2 trillion tokens.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签