cs.AI updates on arXiv.org 10月20日 12:09
发票文档结构化信息提取与评估方法
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文提出了一种从发票文档中提取结构化信息的方法,并建立了一套评估指标体系来评估提取数据的准确性。通过预处理、应用Docling和LlamaCloud服务识别和提取关键字段,并通过建立精确的评估框架来确保提取过程的可靠性。

arXiv:2510.15727v1 Announce Type: new Abstract: This paper presents methods for extracting structured information from invoice documents and proposes a set of evaluation metrics (EM) to assess the accuracy of the extracted data against annotated ground truth. The approach involves pre-processing scanned or digital invoices, applying Docling and LlamaCloud Services to identify and extract key fields such as invoice number, date, total amount, and vendor details. To ensure the reliability of the extraction process, we establish a robust evaluation framework comprising field-level precision, consistency check failures, and exact match accuracy. The proposed metrics provide a standardized way to compare different extraction methods and highlight strengths and weaknesses in field-specific performance.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

发票文档 结构化信息提取 评估指标 Docling LlamaCloud
相关文章