philschmid RSS feed 09月30日 19:13
使用Habana Gaudi微调XLM-RoBERTa进行多语言多分类文本分类
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了如何在AWS上使用Habana Gaudi DL1实例和Hugging Face Transformers、Optimum Habana及Datasets库微调XLM-RoBERTa-large模型,实现多语言多分类文本分类。内容涵盖设置Habana Gaudi环境、加载处理Amazon Science Massive数据集、创建GaudiTrainer进行单HPU微调、使用DistributedRunner进行分布式数据并行训练,并对比了Habana Gaudi与AWS p3.8xlarge实例的成本性能,展示出Gaudi在速度和成本上的显著优势。

📊 使用Hugging Face Transformers、Optimum Habana和Datasets库,结合Habana Gaudi DL1实例,在AWS上高效微调XLM-RoBERTa-large模型,实现多语言多分类文本分类任务,充分发挥Gaudi的AI加速性能。

🌐 处理Amazon Science Massive数据集,该数据集包含超过100万条多语言(英语、德语、法语、意大利语、葡萄牙语、西班牙语、荷兰语)的意图分类标注数据,为模型提供丰富的多语言训练样本。

🚀 通过GaudiTrainer和DistributedRunner实现高效的分布式数据并行训练,在8个HPU上加速模型训练过程,对比实验显示,Habana Gaudi比同等价位的NVIDIA V100 GPU实例快8.4倍,成本降低7.7倍,显著提升训练效率和经济性。

🔧 详细步骤包括设置Habana Gaudi环境、数据预处理、定义compute_metrics函数评估模型、配置GaudiTrainingArguments和AutoModelForSequenceClassification模型,以及使用DataCollatorWithPadding进行数据批处理,确保训练过程稳定高效。

💡 成本性能分析表明,Habana Gaudi实例不仅训练速度快,成本也更低,为大规模多语言模型训练提供了极具竞争力的解决方案,特别适合需要高性能AI计算且注重成本控制的企业和研究机构。

In this blog, you will learn how to fine-tune XLM-RoBERTa for multi-lingual multi-class text-classification using a Habana Gaudi-based DL1 instance on AWS to take advantage of the cost performance benefits of Gaudi. We will use the Hugging Faces Transformers, Optimum Habana and Datasets library to fine-tune a pre-trained transformer for multi-class text classification. In particular, we will fine-tune XLM-RoBERTa-large using the Amazon Science Massive dataset. Before we get started, we need to set up the deep learning environment.

You will learn how to:

    Setup Habana Gaudi instanceLoad and process the datasetCreate a GaudiTrainer and an run single HPU fine-tuningRun distributed data parallel training with GaudiTrainerCost performance benefits of Habana Gaudi on AWS

Requirements

Before we can start make sure you have met the following requirements

1. Setup Habana Gaudi instance

In this example are we going to use Habana Gaudi on AWS using the DL1 instance. We already have created a blog post in the past on how to Setup Deep Learning environment for Hugging Face Transformers with Habana Gaudi on AWS. If you haven't have read this blog post, please read it first and go through the steps on how to setup the environment.Or if you feel comfortable you can use the start_instance.sh in the root of the repository to create your DL1 instance and the continue at step 4. Use Jupyter Notebook/Lab via ssh in the Setup blog post.

    run habana docker container an mount current directory
docker run -ti --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host -v $(pwd):/home/ubuntu/dev --workdir=/home/ubuntu/dev vault.habana.ai/gaudi-docker/1.5.0/ubuntu20.04/habanalabs/pytorch-installer-1.11.0:latest
    install juptyer
    clone repository
git clone https://github.com/philschmid/deep-learning-habana-huggingface.gitcd fine-tuning
    run jupyter notebook
jupyter notebook --allow-root#         http://localhost:8888/?token=f8d00db29a6adc03023413b7f234d110fe0d24972d7ae65e
    continue here

NOTE: The following steps assume that the code/cells are running on a gaudi instance with access to HPUs

As first lets make sure we have access to the HPUs.

import habana_frameworks.torch.core as htcore print(f"device available:{htcore.is_available()}")print(f"device_count:{htcore.get_device_count()}")

next lets install our Hugging Face dependencies and git-lfs.

!pip install transformers datasets tensorboard matplotlib pandas sklearn!pip install git+https://github.com/huggingface/optimum-habana.git # workaround until release of optimum-habana# we will use git-lfs to upload models and artifacts to the hub.#! sudo apt-get install git-lfs!apt-get install git-lfs

to finish our setup lets log into the Hugging Face Hub to push our model artifacts, logs and metrics during training and afterwards to the hub.

To be able to push our model to the Hub, you need to register on the Hugging Face.

We will use the notebook_login util from the huggingface_hub package to log into our account. You can get your token in the settings at Access Tokens

from huggingface_hub import notebook_login notebook_login() 

2. Load and process the dataset

As Dataset we will use the AmazonScience/massive a multilingual intent(text)-classification dataset. The dataset contains over 1M utterances across 51 languages with annotations for the Natural Language Understanding tasks of intent prediction and slot annotation.

We are going to use the:

    English - United States (en-US)German - Germany (de-DE)French - France (fr-FR)Italian - Italy (it-IT)Portuguese - Portugal (pt-PT)Spanish - Spain (es-ES)Dutch - Netherlands (nl-NL)

splits. The dataset will have ~80 000 datapoints for training and ~14 000 for evaluation equally split across the different languages.

The Model which we will fine-tune is xlm-roberta-large a multilingual RoBERTa model. It was pre-trained on 2.5TB of filtered CommonCrawl data containing 100 languages.

model_id = "xlm-roberta-large"gaudi_config_id= "Habana/roberta-large" # more here: https://huggingface.co/Habanadataset_id = "AmazonScience/massive"dataset_configs=["en-US","de-DE","fr-FR","it-IT","pt-PT","es-ES","nl-NL"] seed=33repository_id = "habana-xlm-r-large-amazon-massive"

You can change these configuration to your needs, e.g. the model_id to another BERT-like model for a different language, e.g. BERT-Large.

NOTE: Not all 100+ transformers architectures are currently support by optimum-habana you can find a list of supported archtiectures in the validated models section

We use the datasets library to download and preprocess our dataset. As a frist we will load a 7 different configurations and remove the unnecessary features/columns and the concatenate them into a single dataset.

from datasets import load_dataset, concatenate_datasets, DatasetDict # the columns we want to keep in the datasetkeep_columns = ["utt", "scenario"] # process individuell datasetsproc_lan_dataset_list=[]for lang in dataset_configs:    # load dataset for language    lang_ds = load_dataset(dataset_id, lang)    # only keep the 'utt' & 'scenario column    lang_ds = lang_ds.remove_columns([col for col in lang_ds["train"].column_names if col not in keep_columns])    # rename the columns to match transformers schema    lang_ds = lang_ds.rename_column("utt", "text")    lang_ds = lang_ds.rename_column("scenario", "label")    proc_lan_dataset_list.append(lang_ds) # concat single splits into onetrain_dataset = concatenate_datasets([ds["train"] for ds in proc_lan_dataset_list])eval_dataset = concatenate_datasets([ds["validation"] for ds in proc_lan_dataset_list])# create datset dict for easier processingdataset = DatasetDict(dict(train=train_dataset,validation=eval_dataset))print(dataset)

Before we prepare the dataset for training. Lets take a quick look at the class distribution of the dataset.

import pandas as pd df = dataset["train"].to_pandas() df.hist()

To train our model we need to convert our "Natural Language" to token IDs. This is done by a 🤗 Transformers Tokenizer which will tokenize the inputs (including converting the tokens to their corresponding IDs in the pretrained vocabulary). If you are not sure what this means check out chapter 6 of the Hugging Face Course.

from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained(model_id)

Additionally we add the truncation=True and padding=max_length to align the length and truncate texts that are bigger than the maximum size allowed by the model.

def process(examples):    tokenized_inputs = tokenizer(       examples["text"], padding="max_length", truncation=True    )    return tokenized_inputs tokenized_datasets = dataset.map(process, batched=True)tokenized_datasets["train"].features

Now that our dataset is processed, we can download the pre-trained model and fine-tune it.

3. Create a GaudiTrainer and an run single HPU fine-tuning

Normally you would use the Trainer and TrainingArguments to fine-tune a pytorch-based transformer model. Since we are using the optimum-habana library, we can use the GaudiTrainer and GaudiTrainingArguments instead. The GaudiTrainer is a wrapper around the Trainer and TrainingArguments that allows you to fine-tune a transformer model on a gaudi instance, with a similar API to the Trainer and TrainingArguments classes. Below you see how easy it is to migrate from the Trainer and TrainingArguments classes to the GaudiTrainer and GaudiTrainingArguments classes.

-from transformers import Trainer, TrainingArguments+from optimum.habana import GaudiTrainer, GaudiTrainingArguments # define the training arguments-training_args = TrainingArguments(+training_args = GaudiTrainingArguments(+  use_habana=True,+  use_lazy_mode=True,+  gaudi_config_name=path_to_gaudi_config,  ...) # Initialize our Trainer-trainer = Trainer(+trainer = GaudiTrainer(    model=model,    args=training_args,    train_dataset=train_dataset    ... # other arguments)

Before we create our GaudiTrainer we first need to define a compute_metrics function to evaluate our model on the test set. This function will be used during the training process to compute the accuracy & f1 of our model.

from datasets import load_metricimport numpy as np # define metrics and metrics functionf1_metric = load_metric("f1")accuracy_metric = load_metric( "accuracy") def compute_metrics(eval_pred):    predictions, labels = eval_pred    predictions = np.argmax(predictions, axis=1)    acc = accuracy_metric.compute(predictions=predictions, references=labels)    f1 = f1_metric.compute(predictions=predictions, references=labels, average="micro")    return {        "accuracy": acc["accuracy"],        "f1": f1["f1"],    }

Hyperparameter Definition, Model Loading

from transformers import AutoModelForSequenceClassification,DataCollatorWithPaddingfrom optimum.habana import GaudiTrainer, GaudiTrainingArgumentsfrom huggingface_hub import HfFolder # create label2id, id2label dicts for nice outputs for the modellabels = tokenized_datasets["train"].features["label"].namesnum_labels = len(labels)label2id, id2label = dict(), dict()for i, label in enumerate(labels):    label2id[label] = str(i)    id2label[str(i)] = label  # define training argstraining_args = GaudiTrainingArguments(    output_dir=repository_id,    use_habana=True,    use_lazy_mode=True,    gaudi_config_name=gaudi_config_id,    num_train_epochs=5,    per_device_train_batch_size=8,    per_device_eval_batch_size=8,) # define modelmodel = AutoModelForSequenceClassification.from_pretrained(    model_id,    num_labels=num_labels,    id2label=id2label,    label2id=label2id,) # create Trainertrainer = GaudiTrainer(    model=model,    args=training_args,    train_dataset=tokenized_datasets["train"],    eval_dataset=tokenized_datasets["validation"],    tokenizer=tokenizer,    compute_metrics=compute_metrics,)# start training on 1x HPUtrainer.train()# evaluate modeltrainer.evaluate(eval_dataset=tokenized_datasets["validation"])

4. Run distributed data parallel training with GaudiTrainer

running the training only on a single HPU-core takes way to long (5h). Luckily with DL1 instance we have 8 available HPU-cores meaning we can leverage distributed training.To run our training as distributed training we need to create a training script, which can be used with multiprocessing to run on all HPUs.We have created a scripts/train.py which contains all the previous steps of the example so far. To executed our distributed training we use the DistributedRunner runner from optimum-habana alternatively you could check-out the gaudi_spawn.py in the optimum-habana repository.

from optimum.habana.distributed import DistributedRunnerfrom optimum.utils import logging world_size=8 # Number of HPUs to use (1 or 8) # define distributed runnerdistributed_runner = DistributedRunner(    command_list=["scripts/train.py --use_habana True"],    world_size=world_size,    use_mpi=True,    multi_hls=False,) # start jobret_code = distributed_runner.run()

5. Cost performance benefits of Habana Gaudi on AWS

The distributed training on all 8x HPUs took in total 52 minutes. The dl1.24xlarge instance on AWS costs $13.11 per hour leading to only $11,55 for our training.To provide a cost-performance comparison we run the same training on the AWS p3.8xlarge instance, which costs roughly the same with $12.24, but only has 4x accelerators (4x NVIDIA V100). The training on the p3.8xlarge instance took in total about 439 minutes and cost $89.72.Meaning the Habana Gaudi instance is 8.4x faster and 7.7x cheaper than the price equivalent NVIDIA powered instance.Below is a detailed table of results. Additional both models are available on the Hugging Face Hub at philschmid/habana-xlm-r-large-amazon-massive and philschmid/gpu-xlm-roberta-large-amazon-massive

Note: This comparison currently provides a limited view, since the NVIDIA V100 might not be the best GPU available for training such a large transformer model resulting in a 8x smaller batch size. We plan to run a more detailed cost-performance benchmark including more instances, like NVIDIA A100 and more models, e.g. DistilBERT, GPT-2

Conclusion

That's it for this tutorial. Now you know how to fine-tune Hugging Face Transformers on Habana Gaudi using Optimum. You learned how easily you can migrate from a Trainer based script to a GaudiTrainer based script and how to scale the training to multiple HPUs using the DistributedRunner.

Additionally, we run a simple cost performance benchmark acheiving 8.4x faster and 7.7x cheaper training on Habana Gaudi instance than on the price equivalent NVIDIA powered instance.

Now it is time for you to migrate your training scripts!!


Thanks for reading. If you have any questions, feel free to contact me, through Github, or on the forum. You can also connect with me on Twitter or LinkedIn.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Habana Gaudi XLM-RoBERTa 多语言文本分类 AWS 分布式训练 成本性能优化 Hugging Face Transformers Optimum Habana Amazon Science Massive
acceleratortraining time (in minutes)total costtotal batch sizeaws instance typeinstance price per hour
Habana Gaudi (HPU)52.6$11.5564dl1.24xlarge$13.11
NVIDIA V100 (GPU)439.8$89.724p3.8xlarge$12.24