AWS Machine Learning Blog 前天 02:40
微调Amazon Nova Lite以优化文档处理
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文详细介绍了如何使用Amazon Nova Lite进行多模态微调,以提升文档处理的准确性和效率。文章重点阐述了在处理发票、采购订单、表格等专业文档时的挑战,并提供了从数据准备到模型部署的完整操作指南。通过微调,模型能更好地理解文档的复杂布局、处理数据质量和语言差异,从而显著提高提取结构化信息的能力。文中还探讨了零样本提示、少样本提示和微调等不同方法,并展示了微调后的模型在提取关键信息(如员工、雇主信息、收入和福利等)方面取得了显著的性能提升,尤其在精度和召回率方面表现优异。此外,文章还介绍了数据集准备的最佳实践、模型训练配置以及推理选项,并分析了成本效益,强调了微调Amazon Nova Lite在实现高精度文档处理的同时,能够保持成本效益和可扩展性。

🎯 **文档处理的挑战与微调的必要性**: 传统的通用大型语言模型(LLMs)在处理如发票、税单和贷款申请等专业文档时,常因其复杂的布局、多样的格式、数据质量差异以及严格的准确性要求而表现不佳。微调(Fine-tuning)成为关键技术,通过在特定数据集上训练模型,使其能够学习文档特有的布局和字段关系,适应数据变异性,提供一致的结构化输出,从而显著提升在文档处理任务上的性能。

🔧 **微调Amazon Nova Lite的两种主要技术**: 文章介绍了两种针对Amazon Nova模型的定制化技术:一是“特定任务微调”,通过监督式微调(SFT)来调整模型权重,可选择参数高效微调(PEFT)或全模型微调;二是“知识蒸馏”,将大型模型的知识迁移到小型、更高效的模型中。这两种技术都可以在Amazon Bedrock上轻松实现,无需复杂的管理。

📈 **微调带来的显著性能提升**: 通过对Amazon Nova Lite进行微调,在处理W2税单数据提取任务时,模型在员工信息、雇主信息、收入和福利等多个关键字段的提取准确率、精确率和F1分数上均有大幅提升。例如,雇主信息的准确率从58.67%提升至92.67%,多州就业信息的精确率提升了近40%,同时保持了100%的召回率,证明了微调在提升文档信息提取精度方面的有效性。

📊 **数据准备与模型部署的关键步骤**: 成功的微调依赖于高质量的训练数据。文章强调了数据集分析、模型基线评估、提示优化以及数据格式化(JSONL)的重要性。在模型部署方面,提供了两种推理选项:按需模型推理(ODI)和预置吞吐量(Provisioned Throughput)端点,前者更灵活且成本效益高,适用于使用量波动的场景,后者则适合稳定、高流量的需求。同时,清理不再使用的模型和部署资源以避免不必要的成本也是重要环节。

Multimodal fine-tuning represents a powerful approach for customizing vision large language models (LLMs) to excel at specific tasks that involve both visual and textual information. Although base multimodal models offer impressive general capabilities, they often fall short when faced with specialized visual tasks, domain-specific content, or output formatting requirements. Fine-tuning addresses these limitations by adapting models to your specific data and use cases, dramatically improving performance on tasks that matter to your business.

A common use case is document processing, which includes extracting structured information from complex layouts including invoices, purchase orders, forms, tables, or technical diagrams. Although off-shelf LLMs often struggle with specialized documents like tax forms, invoices, and loan applications, fine-tuned models can learn from high data variations and can deliver significantly higher accuracy while reducing processing costs.

This post provides a comprehensive hands-on guide to fine-tune Amazon Nova Lite for document processing tasks, with a focus on tax form data extraction. Using our open-source GitHub repository code sample, we demonstrate the complete workflow from data preparation to model deployment. Since Amazon Bedrock provides on-demand inference with pay-per-token pricing for Amazon Nova, we can benefit from the accuracy improvement from model customization and maintain the pay-as-you-go cost structure.

The document processing challenge

Given a single or multi-page document, the goal is to extract or derive specific structured information from the document so that it can be used for downstream systems or additional insights. The following diagram shows how a vision LLM can be used to derive the structured information based on a combination of text and vision capabilities.

The key challenges for enterprises in workflow automation when processing documents, like invoices or W2 tax forms, are the following:

Approaches for intelligent document processing that use LLMs or vision LLMs fall into three main categories:

For the first two approaches, refer to the amazon-nova-samples repository, which contains sample code on how to use the Amazon Bedrock Converse API for structured output by using tool calling.

Off-shelf LLMs excel at general document understanding, but they might not optimally handle domain-specific challenges. A fine-tuned Nova model can enhance performance by:

Creating the annotated dataset and selecting the customization technique

While there are various methods for customization of Amazon Nova models available, the most relevant for document processing are the following:

To be able to learn from previous examples, you need to either have an annotated dataset from which we can learn or a model that is good enough for your task so that you can use it as a teacher model.

    Automated dataset annotation with historic data from Enterprise Resource Planning (ERP) systems, such as SAP: Many customers have already historic documents that have been manually processed and consumed by downstream systems, like ERP or customer relationship management (CRM) systems. Explore existing downstream systems like SAP and the data they contain. This data can often be mapped back to the original source document it has been derived from and helps you to bootstrap an annotated dataset very quickly. Manual dataset annotation: Identify the most relevant documents and formats, and annotate them using human annotators, so that you have document/JSON pairs where the JSON contains the target information that you want to extract or derive from your source documents. Annotate with the teacher model: Explore if a larger model like Nova Premier can provide accurate enough results using prompt engineering. If that is the case, you can also use distillation.

For the first and second options, we recommend supervised model fine-tuning. For the third, model distillation is the right approach.

Amazon Bedrock currently provides both fine-tuning and distillation techniques, so that anyone with a basic data science skillset can very easily submit jobs. They run on compute completely managed by Amazon, so you don’t have worry about instance sizes or capacity limits.

Nova customization is also available with Amazon SageMaker with more options and controls. For example, if you have sufficient high-quality labeled data and you want deeper customization for your use case, full rank fine-tuning might produce higher accuracy. Full rank fine tuning is supported with SageMaker training jobs and SageMaker HyperPod.

Data preparation best practices

The quality and structure of your training data fundamentally determine the success of fine-tuning. Here are key steps and considerations for preparing effective multimodal datasets and configuring your fine-tuning job:

Dataset analysis and base model evaluation

Our demonstration uses a synthetic dataset of W2 tax forms: the Fake W-2 (US Tax Form) Dataset. This public dataset comprises simulated US tax returns (W-2 statements for years 2016-19), including noisy images that mimic low-quality scanned W2 tax forms.

Before fine-tuning, it’s crucial to:

    Analyze dataset characteristics (image quality, field completeness, class distribution), define use-case-specific evaluation metrics, and establish baseline model performance. Compare each predicted field value against the ground truth, calculating precision, recall, and F1 scores for individual fields and overall performance.

Prompt optimization

Crafting an effective prompt is essential for aligning the model with task requirements. Our system comprises two key components:

    System prompt: Defines the task, provides detailed instructions for each field to be extracted, and specifies the output format. User prompt: Follows Nova vision understanding best practices, utilizing the {media_file}-then-{text} structure as outlined in the Amazon Nova model user guide.

Iterate on your prompts using the base model to optimize performance before fine-tuning.

Dataset preparation

Prepare your dataset in JSONL format and split it into training, validation, and test sets:

    Training set: 70-80% of data Validation set: 10-20% of data Test set: 10-20% of data

Fine-tuning job configuration and monitoring

Once the dataset is prepared and uploaded to an Amazon Simple Storage Service (Amazon S3) bucket, we can configure and submit the fine-tuning job on Bedrock. When configuring your fine-tuning job on Amazon Bedrock, key parameters include:

ParameterDefinitionPurposeEpochs Number of complete passes through the training dataset Determines how many times the model sees the entire dataset during trainingLearning rate Step size for gradient descent optimization Controls how much model weights are adjusted in response to estimated errorLearning rate warmup steps Number of steps to gradually increase the learning rate Prevents instability by slowly ramping up the learning rate from a small value to the target rate

Amazon Bedrock customization provides validation loss metrics throughout the training process. Monitor these metrics to:

The following graph shows an example metric analysis:

When analyzing the training and validation loss curves, the relative behavior between these metrics provides crucial insights into the model’s learning dynamics. Optimal learning patterns can be observed as:

Model inference options for customized models

Once your custom model has been created in Bedrock, you have two main ways to make inferences to that model: use on-demand custom model inference (ODI) deployments, or use Provisioned Throughput endpoints. Let’s talk about why and when to choose one over the other.

On-demand custom model deployments provide a flexible and cost-effective way to leverage your custom Bedrock models. With on-demand deployments, you only pay for the compute resources you use, based on the number of tokens processed during inference. This makes on-demand a great choice for workloads with variable or unpredictable usage patterns, where you want to avoid over-provisioning resources. The on-demand approach also offers automatic scaling, so you don’t have to worry about managing infrastructure capacity. Bedrock will automatically provision the necessary compute power to handle your requests in near real time. This self-service, serverless experience can simplify your operations and deployment workflows.

Alternatively, Provisioned Throughput endpoints are recommended for workloads with steady traffic patterns and consistent high-volume requirements, offering predictable performance and cost benefits over on-demand scaling.

This example uses the ODI option to leverage per-token based pricing; the following code snippet is how you can create an ODI endpoint for your custom model:

# Function to create on-demand inferencing deployment for custom modeldef create_model_deployment(custom_model_arn):    """    Create an on-demand inferencing deployment for the custom model        Parameters:    -----------    custom_model_arn : str        ARN of the custom model to deploy            Returns:    --------    deployment_arn : str        ARN of the created deployment    """    try:        print(f"Creating on-demand inferencing deployment for model: {custom_model_arn}")                # Generate a unique name for the deployment        deployment_name = f"nova-ocr-deployment-{time.strftime('%Y%m%d-%H%M%S')}"                # Create the deployment        response = bedrock.create_custom_model_deployment(            modelArn=custom_model_arn,            modelDeploymentName=deployment_name,            description=f"on-demand inferencing deployment for model: {custom_model_arn}",        )                # Get the deployment ARN        deployment_arn = response.get('customModelDeploymentArn')                print(f"Deployment request submitted. Deployment ARN: {deployment_arn}")        return deployment_arn        except Exception as e:        print(f"Error creating deployment: {e}")        return None

Evaluation: Accuracy improvement with fine-tuning

Our evaluation of the base model and the fine-tuned Nova model shows significant improvements across all field categories. Let’s break down the performance gains:

Field category Metric Base model Fine-tuned model Improvement Employee information Accuracy 58% 82.33% 24.33% Precision 57.05% 82.33% 25.28% Recall 100% 100% 0% F1 score 72.65% 90.31% 17.66% Employer information Accuracy 58.67% 92.67% 34% Precision 53.66% 92.67% 39.01% Recall 100% 100% 0% F1 score 69.84% 96.19% 26.35% Earnings Accuracy 62.71% 85.57% 22.86% Precision 60.97% 85.57% 24.60% Recall 99.55% 100% 0.45% F1 score 75.62% 92.22% 16.60% Benefits Accuracy 45.50% 60% 14.50% Precision 45.50% 60% 14.50% Recall 93.81% 100% 6.19% F1 score 61.28% 75% 13.72% Multi-state employment Accuracy 58.29% 94.19% 35.90% Precision 52.14% 91.83% 39.69% Recall 99.42% 100% 0.58% F1 score 68.41% 95.74% 27.33%

The following graphic shows a bar chart comparing the F1 scores of the base model and fine-tuned model for each field category, with the improvement percentage shown in the previous table:

Key observations:

Clean up

To avoid incurring unnecessary costs when you’re no longer using your custom model, it’s important to properly clean up the resources. Follow these steps to remove both the deployment and the custom model:

    Delete the custom model deployment Delete the custom model

Cost analysis

In our example, we chose to use Bedrock fine-tuning job which is PEFT and ODI is available. PEFT fine tuning Nova Lite paired with on-demand inference capabilities offers a cost-effective and scalable solution for enhanced document processing. The cost structure is straightforward and transparent:

One-time cost:

Ongoing costs:

On-demand inference allows you to run your custom Nova models without maintaining provisioned endpoints, enabling pay-as-you-go pricing based on actual token usage. This approach eliminates the need for capacity planning while ensuring cost-efficient scaling.

Conclusion

In this post, we’ve demonstrated how fine-tuning Amazon Nova Lite can transform document processing accuracy while maintaining cost efficiency. Our evaluation shows significant performance gains, with up to 39% improvement in precision for critical fields and perfect recall across key document categories. While our implementation did not require constrained decoding, tool calling with Nova can provide additional reliability for more complex structured outputs, especially when working with intricate JSON schemas. Please refer to the resource on structured output with tool calling for further information.

The flexible deployment options, including on-demand inference with pay-per-use pricing, eliminate infrastructure overhead while maintaining the same inference costs as the base model. With the dataset we used for this example, runtime inference per page cost was $0.00021, making it a cost-effective solution. Through practical examples and step-by-step guides, we’ve shown how to prepare training data, fine-tune models, and evaluate performance with clear metrics.

To get started with your own implementation, visit our GitHub repository for complete code samples and detailed documentation.


About the authors

Sharon Li is an AI/ML Specialist Solutions Architect at Amazon Web Services (AWS) based in Boston, Massachusetts. With a passion for leveraging cutting-edge technology, Sharon is at the forefront of developing and deploying innovative generative AI solutions on the AWS cloud platform.

Arlind Nocaj is a GTM Specialist Solutions Architect for AI/ML and Generative AI for europe central based in AWS Zurich Office, who guides enterprise customers through their digital transformation journeys. With a PhD in network analytics and visualization (Graph Drawing) and over a decade of experience as a research scientist and software engineer, he brings a unique blend of academic rigor and practical expertise to his role. His primary focus lies in using the full potential of data, algorithms, and cloud technologies to drive innovation and efficiency. His areas of expertise include Machine Learning, Generative AI and in particular Agentic systems with Multi-modal LLMs for document processing and structured insights.

Pat Reilly is a Sr. Specialist Solutions Architect on the Amazon Bedrock Go-to-Market team. Pat has spent the last 15 years in analytics and machine learning as a consultant. When he’s not building on AWS, you can find him fumbling around with wood projects.

Malte Reimann is a Solutions Architect based in Zurich, working with customers across Switzerland and Austria on their cloud initiatives. His focus lies in practical machine learning applications—from prompt optimization to fine-tuning vision language models for document processing. The most recent example, working in a small team to provide deployment options for Apertus on AWS. An active member of the ML community, Malte balances his technical work with a disciplined approach to fitness, preferring early morning gym sessions when it’s empty. During summer weekends, he explores the Swiss Alps on foot and enjoying time in nature. His approach to both technology and life is straightforward: consistent improvement through deliberate practice, whether that’s optimizing a customer’s cloud deployment or preparing for the next hike in the clouds.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Amazon Nova Lite 文档处理 微调 多模态 AI Amazon Bedrock Document Processing Fine-tuning Multimodal AI Amazon Bedrock
相关文章