philschmid RSS feed 09月30日 19:12
Transformers示例脚本使用指南
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Transformers的每个版本都有配套的示例脚本,需测试与维护。使用时需注意版本匹配,因运行较新版本的示例可能失败。所有示例提供详尽文档(含README),说明功能与参数支持。为简化任务切换,各示例使用相同参数集。本文档介绍从开发环境设置、示例下载到模型微调的完整流程,涵盖BERT文本分类与BART摘要任务,并强调依赖安装与参数配置的重要性。

📚 每个Transformers版本提供配套示例脚本,需测试与维护,使用时需确保版本匹配,避免因版本差异导致运行失败。

📝 示例脚本提供详尽文档(含README),说明功能与参数支持,为用户使用提供清晰指引。

🔄 为简化任务切换,各示例脚本使用相同参数集,便于用户在不同任务间无缝切换。

🛠️ 运行示例前需安装所有依赖,包括transformers和datasets库,以及特定任务可能需要的额外依赖(通过requirements.txt安装)。

🚀 本文档通过BERT文本分类与BART摘要任务,展示从开发环境设置、示例下载到模型微调的完整流程,强调参数配置的重要性。

Each release of Transformers has its own set of examples script, which are tested and maintained. This is important to keep in mind when using examples/ since if you try to run an example from, e.g. a newer version than the transformers version you have installed it might fail. All examples provide documentation in the repository with a README, which includes documentation about the feature of the example and which arguments are supported. All examples provide an identical set of arguments to make it easy for users to switch between tasks. Now, let's get started.

1. Setup Development Environment

Our first step is to install the Hugging Face Libraries, including transformers and datasets. The version of transformers we install will be the version of the examples we are going to use. If you have transformers already installed, you need to check your version.

pip install torchpip install "transformers==4.25.1" datasets  --upgrade

2. Download the example script

The example scripts are stored in the GitHub repository of transformers. This means we need first to clone the repository and then checkout the release of the transformers version we have installed in step 1 (for us, 4.25.1)

git clone https://github.com/huggingface/transformerscd transformersgit checkout tags/v4.25.1 # change 4.25.1 to your version if different

3. Fine-tune BERT for text-classification

Before we can run our script we first need to define the arguments we want to use. For text-classification we need at least a model_name_or_path which can be any supported architecture from the Hugging Face Hub or a local path to a transformers model. Additional parameter we will use are:

    dataset_name : an ID for a dataset hosted on the Hugging Face Hubdo_train & do_eval: to train and evaluate our modelnum_train_epochs: the number of epochs we use for training.per_device_train_batch_size: the batch size used during training per GPUoutput_dir: where our trained model and logs will be saved

You can find a full list of supported parameter in the script. Before we can run our script we have to make sure all dependencies needed for the example are installed. Every example script which requires additional dependencies then transformers and datasets provides a requirements.txt in the directory, which can try to install.

pip install -r examples/pytorch/text-classification/requirements.txt

Thats it, now we can run our script from a CLI, which will start training BERT for text-classification on the emotion dataset.

python3 examples/pytorch/text-classification/run_glue.py \  --model_name_or_path bert-base-cased \  --dataset_name emotion \  --do_train \  --do_eval \  --per_device_train_batch_size 32 \  --num_train_epochs 3 \  --output_dir /bert-test

4. Fine-tune BART for summarization

In 3. we learnt how easy it is to leverage the examples fine-tun a BERT model for text-classification. In this section we show you how easy it to switch between different tasks. We will now fine-tune BART for summarization on the CNN dailymail dataset. We will provide the same arguments than for text-classification, but extend it with:

    dataset_config_name to use a specific version of the datasettext_column the field in our dataset, which holds the text we want to summarizesummary_column the field in our dataset, which holds the summary we want to learn.

Every example script which requires additional dependencies then transformers and datasets provides a requirements.txt in the directory, which can try to install.

pip install -r examples/pytorch/summarization/requirements.txt

Thats it, now we can run our script from a CLI, which will start training BERT for text-classification on the emotion dataset.

python3 examples/pytorch/summarization/run_summarization.py \  --model_name_or_path facebook/bart-base \  --dataset_name cnn_dailymail \  --dataset_config_name "3.0.0" \  --text_column "article" \  --summary_column "highlights" \  --do_train \  --do_eval \  --per_device_train_batch_size 32 \  --num_train_epochs 3 \  --output_dir /bert-test

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Transformers 示例脚本 BERT BART 文本分类 摘要任务 开发环境 依赖安装
相关文章