cs.AI updates on arXiv.org 07月14日
Normalized vs Diplomatic Annotation: A Case Study of Automatic Information Extraction from Handwritten Uruguayan Birth Certificates
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本研究评估了用于从乌拉圭出生证明中提取关键值信息的Document Attention Network(DAN)模型。通过两种标注策略,在少量训练数据和标注努力的情况下微调DAN。实验结果表明,标准化标注在日期和出生地等可标准化的字段上更为有效,而外交标注在姓名和姓氏等不可标准化的字段上表现更佳。

arXiv:2507.08636v1 Announce Type: cross Abstract: This study evaluates the recently proposed Document Attention Network (DAN) for extracting key-value information from Uruguayan birth certificates, handwritten in Spanish. We investigate two annotation strategies for automatically transcribing handwritten documents, fine-tuning DAN with minimal training data and annotation effort. Experiments were conducted on two datasets containing the same images (201 scans of birth certificates written by more than 15 different writers) but with different annotation methods. Our findings indicate that normalized annotation is more effective for fields that can be standardized, such as dates and places of birth, whereas diplomatic annotation performs much better for fields containing names and surnames, which can not be standardized.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

DAN网络 手写文件信息提取 标注策略
相关文章