CRACQ：多维文本评估框架

cs.AI updates on arXiv.org 10月06日

CRACQ：多维文本评估框架

本文提出CRACQ，一个针对五项特质（连贯性、严谨性、适宜性、完整性、质量）的多维评估框架，扩展了基于特质自动作文评分（AES）的评估方法，涵盖多种机器生成文本，实现可解释的自动评估。

arXiv:2510.02337v1 Announce Type: cross Abstract: This paper presents CRACQ, a multi-dimensional evaluation framework tailored to evaluate documents across f i v e specific traits: Coherence, Rigor, Appropriateness, Completeness, and Quality. Building on insights from traitbased Automated Essay Scoring (AES), CRACQ expands its fo-cus beyond essays to encompass diverse forms of machine-generated text, providing a rubricdriven and interpretable methodology for automated evaluation. Unlike singlescore approaches, CRACQ integrates linguistic, semantic, and structural signals into a cumulative assessment, enabling both holistic and trait-level analysis. Trained on 500 synthetic grant pro-posals, CRACQ was benchmarked against an LLM-as-a-judge and further tested on both strong and weak real applications. Preliminary results in-dicate that CRACQ produces more stable and interpretable trait-level judgments than direct LLM evaluation, though challenges in reliability and domain scope remain

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

CRACQ 文本评估自动作文评分机器生成文本多维评估

相关文章

透视中国饭店集团60强：“唯规模论”狭隘了，“碎片化”竞争来了

让 LLM 来评判 | 基础概念

When LLMs Disagree: Diagnosing Relevance Filtering Bias and Retrieval Divergence in SDG Search

MultiSocial: Multilingual Benchmark of Machine-Generated Text Detection of Social-Media Texts

IndoPref: A Multi-Domain Pairwise Preference Dataset for Indonesian

Authorship Attribution in Multilingual Machine-Generated Texts

LongRecall: A Structured Approach for Robust Recall Evaluation in Long-Form Text

Interpreting LLM-as-a-Judge Policies via Verifiable Global Explanations

When Personalization Tricks Detectors: The Feature-Inversion Trap in Machine-Generated Text Detection

RAISE: A Unified Framework for Responsible AI Scoring and Evaluation