philschmid RSS feed 09月30日 19:13
Transformer模型ONNX转换与优化指南
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了如何将Hugging Face Transformers模型转换为ONNX格式,并利用Hugging Face Optimum库进行优化。内容涵盖ONNX的基本概念、Optimum的功能、支持的Transformer架构以及详细的模型转换方法,包括低级torch API、中级transformers.onnx API和高级Optimum API。最后探讨了后续的优化和量化工具使用。

ONNX(开放神经网络交换格式)是一种开放的机器学习模型表示标准,定义了通用算子和文件格式,支持PyTorch、TensorFlow等多种框架,用于构建神经网络计算图,但本身不是运行时环境,需配合ONNX Runtime使用。

Hugging Face Optimum是Hugging Face Transformers的扩展库,提供统一API进行模型优化,支持在加速硬件上高效训练和运行,包括Graphcore IPU和Habana Gaudi,并提供模型转换、量化、图优化、加速训练与推理等功能,特别支持transformers管道的优化。

Optimum支持的Transformer架构包括BERT、GPT系列、RoBERTa、T5、ViT等多种常用模型,用户可通过官方文档查询完整列表,这些模型均可转换为ONNX格式并利用Optimum进行进一步优化。

模型转换为ONNX有三种方法:1)低级torch.onnx API需手动配置输入输出名称和动态轴等参数;2)中级transformers.onnx API利用配置对象简化流程;3)高级Optimum API通过from_transformers=True参数自动完成转换,并支持直接用于预测或管道加载,操作最为便捷。

转换完成后,用户可利用Optimum提供的优化工具进行模型量化(如INT8、FP16)和图优化,显著提升推理效率,并支持与ONNX Runtime结合在多种硬件平台上部署,如GPU、TPU、IPU和FPGA等。

Hundreds of Transformers experiments and models are uploaded to the Hugging Face Hub every single day. Machine learning engineers and students conducting those experiments use a variety of frameworks like PyTorch, TensorFlow/Keras, or others. These models are already used by thousands of companies and form the foundation of AI-powered products.

If you deploy Transformers models in production environments, we recommend exporting them first into a serialized format that can be loaded, optimized, and executed on specialized runtimes and hardware.

In this guide, you'll learn about:

    What is ONNX?What is Hugging Face Optimum?What Transformers architectures are supported?How can I convert a Transformers model (BERT) to ONNX?What's next?

Let's get started! 🚀


If you are interested in optimizing your models to run with maximum efficiency, check out the 🤗 Optimum library.

1. What is ONNX?

The ONNX (Open Neural Network eXchange) is an open standard and format to represent machine learning models. ONNX defines a common set of operators and a common file format to represent deep learning models in a wide variety of frameworks, including PyTorch and TensorFlow.

Pseudo ONNX Graph. Visualized with Netron

When a model is exported to the ONNX format, these operators are used to construct a computational graph (often called an intermediate representation) which represents the flow of data through the neural network.

Important: ONNX Is not a Runtime ONNX is only the representation that can be used with runtimes like ONNX Runtime. You can find a list of supported accelerators here.

➡️Learn more about ONNX.

2. What is Hugging Face Optimum?

Hugging Face Optimum is an open-source library and an extension of Hugging Face Transformers, that provides a unified API of performance optimization tools to achieve maximum efficiency to train and run models on accelerated hardware, including toolkits for optimized performance on Graphcore IPU and Habana Gaudi.

Optimum can be used for converting, quantization, graph optimization, accelerated training & inference with support for transformers pipelines.

Below you can see a typical customer journey of how you can leverage Optimum withttps://www.philschmid.de/static/blog/convert-transformers-to-onnx/user-journey.png-to-onnx/user-journey.png" alt="user-journey.png">

➡️ Learn more about Optimum

3. What Transformers architectures are supported?

A list of all supported Transformers architectures can be found in the ONNX section of the Transformers documentation. Below is an excerpt of the most commonly used architectures which can be converted to ONNX and optimized with Hugging Face Optimum

    ALBERTBARTBERTDistilBERTELECTRAGPT NeoGPT-JGPT-2RoBERTT5ViTXLM…

➡️ All supported architectures

4. How can I convert a Transformers model (BERT) to ONNX?

There are currently three ways to convert your Hugging Face Transformers models to ONNX. In this section, you will learn how to export distilbert-base-uncased-finetuned-sst-2-english for text-classification using all three methods going from the low-level torch API to the most user-friendly high-level API of optimum. Each method will do exactly the same

Export with torch.onnx (low-level)

torch.onnx enables you to convert model checkpoints to an ONNX graph by the export method. But you have to provide a lot of values like input_names, dynamic_axes, etc.

You’ll first need to install some dependencies:

pip install transformers torch

exporting our checkpoint with export

import torchfrom transformers import AutoModelForSequenceClassification, AutoTokenizer # load model and tokenizermodel_id = "distilbert-base-uncased-finetuned-sst-2-english"model = AutoModelForSequenceClassification.from_pretrained(model_id)tokenizer = AutoTokenizer.from_pretrained(model_id)dummy_model_input = tokenizer("This is a sample", return_tensors="pt") # exporttorch.onnx.export(    model,    tuple(dummy_model_input.values()),    f="torch-model.onnx",    input_names=['input_ids', 'attention_mask'],    output_names=['logits'],    dynamic_axes={'input_ids': {0: 'batch_size', 1: 'sequence'},                  'attention_mask': {0: 'batch_size', 1: 'sequence'},                  'logits': {0: 'batch_size', 1: 'sequence'}},    do_constant_folding=True,    opset_version=13,)

Export with transformers.onnx (mid-level)

transformers.onnx enables you to convert model checkpoints to an ONNX graph by leveraging configuration objects. That way you don’t have to provide the complex configuration for dynamic_axes etc.

You’ll first need to install some dependencies:

pip install transformers[onnx] torch

Exporting our checkpoint with the transformers.onnx.

from pathlib import Pathimport transformersfrom transformers.onnx import FeaturesManagerfrom transformers import AutoConfig, AutoTokenizer, AutoModelForSequenceClassification # load model and tokenizermodel_id = "distilbert-base-uncased-finetuned-sst-2-english"feature = "sequence-classification"base_model = AutoModelForSequenceClassification.from_pretrained(model_id)tokenizer = AutoTokenizer.from_pretrained(model_id) # load configmodel_kind, model_onnx_config = FeaturesManager.check_supported_model_or_raise(model, feature=feature)onnx_config = model_onnx_config(model.config) # exportonnx_inputs, onnx_outputs = transformers.onnx.export(        preprocessor=tokenizer,        model=model,        config=onnx_config,        opset=13,        output=Path("trfs-model.onnx"))

Export with Optimum (high-level)

Optimum Inference includes methods to convert vanilla Transformers models to ONNX using the ORTModelForXxx classes. To convert your Transformers model to ONNX you simply have to pass from_transformers=True to the from_pretrained() method and your model will be loaded and converted to ONNX leveraging the transformers.onnx package under the hood.

You’ll first need to install some dependencies:

pip install optimum[onnxruntime]

Exporting our checkpoint with ORTModelForSequenceClassification

from optimum.onnxruntime import ORTModelForSequenceClassification model = ORTModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english",from_transformers=True)

The best part about the conversion with Optimum is that you can immediately use the model to run predictions or load it inside a pipeline.

5. What's next?

Since you successfully convert your Transformers model to ONNX the whole set of optimization and quantization tools is now open to use. Potential next steps can be:

If you are interested in optimizing your models to run with maximum efficiency, check out the 🤗 Optimum library.


Thanks for reading! If you have any questions, feel free to contact me, through Github, or on the forum. You can also connect with me on Twitter or LinkedIn.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

ONNX Hugging Face Optimum Transformer模型 模型优化 机器学习 PyTorch TensorFlow 模型转换
相关文章