cs.AI updates on arXiv.org 10月08日
新资源:SynCED-EnDe助力机器翻译错误检测
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了SynCED-EnDe,一个包含1,000金标签和8,000银标签的句子对资源,旨在提升机器翻译错误检测的准确性和全面性,并推动其在信息检索和对话助手等领域的应用。

arXiv:2510.05144v1 Announce Type: cross Abstract: Critical Error Detection (CED) in machine translation aims to determine whether a translation is safe to use or contains unacceptable deviations in meaning. While the WMT21 English-German CED dataset provided the first benchmark, it is limited in scale, label balance, domain coverage, and temporal freshness. We present SynCED-EnDe, a new resource consisting of 1,000 gold-labeled and 8,000 silver-labeled sentence pairs, balanced 50/50 between error and non-error cases. SynCED-EnDe draws from diverse 2024-2025 sources (StackExchange, GOV.UK) and introduces explicit error subclasses, structured trigger flags, and fine-grained auxiliary judgments (obviousness, severity, localization complexity, contextual dependency, adequacy deviation). These enrichments enable systematic analyses of error risk and intricacy beyond binary detection. The dataset is permanently hosted on GitHub and Hugging Face, accompanied by documentation, annotation guidelines, and baseline scripts. Benchmark experiments with XLM-R and related encoders show substantial performance gains over WMT21 due to balanced labels and refined annotations. We envision SynCED-EnDe as a community resource to advance safe deployment of MT in information retrieval and conversational assistants, particularly in emerging contexts such as wearable AI devices.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

机器翻译 错误检测 SynCED-EnDe 信息检索 对话助手
相关文章