philschmid RSS feed 09月30日
Few-Shot Learning助力机器学习模型提升性能
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Few-Shot Learning是一种通过在推理时提供少量示例来指导机器学习模型预测的技术,与需要大量训练数据的传统微调技术不同。该技术主要应用于计算机视觉领域,但最新的语言模型如GPT-Neo和GPT-3使其在自然语言处理(NLP)领域也得到应用。在NLP中,Few-Shot Learning通常包括任务描述、示例和提示三个主要组件,通过少量示例使模型能够泛化到未见过的相关任务。GPT-Neo和🤗加速推理API的结合使用,可以有效地实现Few-Shot Learning,帮助用户生成自己的预测,并通过调整超参数如最大新token数、温度和结束序列来控制文本生成。

📌 Few-Shot Learning是一种通过在推理时提供少量示例来指导机器学习模型预测的技术,与需要大量训练数据的传统微调技术不同,适用于自然语言处理(NLP)领域。

🔍 在NLP中,Few-Shot Learning通常包括任务描述、示例和提示三个主要组件,通过少量示例使模型能够泛化到未见过的相关任务。

🚀 GPT-Neo和🤗加速推理API的结合使用,可以有效地实现Few-Shot Learning,帮助用户生成自己的预测,并通过调整超参数如最大新token数、温度和结束序列来控制文本生成。

🧮 Few-Shot NLP示例的创建可能具有挑战性,因为需要通过示例明确模型应执行的任务,且模型对示例的编写方式非常敏感。

📈 OpenAI的研究表明,随着语言模型参数数量的增加,其Few-Shot提示能力也会提高,GPT-Neo作为基于GPT架构的模型,在Pile数据集上进行训练,更适合其训练文本分布匹配的文本。

Cross post from huggingface.co/blog

In many Machine Learning applications, the amount of available labeled data is a barrier to producing a high-performing model. The latest developments in NLP show that you can overcome this limitation by providing a few examples at inference time with a large language model - a technique known as Few-Shot Learning. In this blog post, we'll explain what Few-Shot Learning is, and explore how a large language model called GPT-Neo, and the 🤗 Accelerated Inference API, can be used to generate your own predictions.

What is Few-Shot Learning?

Few-Shot Learning refers to the practice of feeding a machine learning model with a very small amount of training data to guide its predictions, like a few examples at inference time, as opposed to standard fine-tuning techniques which require a relatively large amount of training data for the pre-trained model to adapt to the desired task with accuracy.

This technique has been mostly used in computer vision, but with some of the latest Language Models, like EleutherAI GPT-Neo and OpenAI GPT-3, we can now use it in Natural Language Processing (NLP).

In NLP, Few-Shot Learning can be used with Large Language Models, which have learned to perform a wide number of tasks implicitly during their pre-training on large text datasets. This enables the model to generalize, that is to understand related but previously unseen tasks, with just a few examples.

Few-Shot NLP examples consist of three main components:

    Task Description: A short description of what the model should do, e.g. "Translate English to French"Examples: A few examples showing the model what it is expected to predict, e.g. "sea otter => loutre de mer"Prompt: The beginning of a new example, which the model should complete by generating the missing text, e.g. "cheese => "

Image from

Language Models are Few-Shot Learners

Creating these few-shot examples can be tricky, since you need to articulate the “task” you want the model to perform through them. A common issue is that models, especially smaller ones, are very sensitive to the way the examples are written.

An approach to optimize Few-Shot Learning in production is to learn a common representation for a task and then train task-specific classifiers on top of this representation.

OpenAI showed in the GPT-3 Paper that the few-shot prompting ability improves with the number of language model parahttps://www.philschmid.de/static/blog/few-shot-learning-gpt-neo/few-shot-performance.png/few-shot-performance.png" alt="few-shot-performance">

Image from

Language Models are Few-Shot Learners

Let's now take a look at how at how GPT-Neo and the 🤗 Accelerated Inference API can be used to generate your own Few-Shot Learning predictions!


What is GPT-Neo?

GPT⁠-⁠Neo is a family of transformer-based language models from EleutherAI based on the GPT architecture. EleutherAI's primary goal is to train a model that is equivalent in size to GPT⁠-⁠3 and make it available to the public under an open license.

All of the currently available GPT-Neo checkpoints are trained with the Pile dataset, a large text corpus that is extensively documented in (Gao et al., 2021). As such, it is expected to function better on the text that matches the distribution of its training text; we recommend keeping this in mind when designing your examples.


🤗 Accelerated Inference API

The Accelerated Inference API is our hosted service to run inference on any of the 10,000+ models publicly available on the 🤗 Model Hub, or your own private models, via simple API calls. The API includes acceleration on CPU and GPU with up to 100x speedup compared to out of the box deployment of Transformers.

To integrate Few-Shot Learning predictions with GPT-Neo in your own apps, you can use the 🤗 Accelerated Inference API with the code snippet below. You can find your API Token here, if you don't have an account you can get started here.

import jsonimport requests API_TOKEN = "" def query(payload='',parameters=None,options={'use_cache': False}):    API_URL = "https://api-inference.huggingface.co/models/EleutherAI/gpt-neo-2.7B"       headers = {"Authorization": f"Bearer {API_TOKEN}"}    body = {"inputs":payload,'parameters':parameters,'options':options}    response = requests.request("POST", API_URL, headers=headers, data= json.dumps(body))    try:      response.raise_for_status()    except requests.exceptions.HTTPError:        return "Error:"+" ".join(response.json()['error'])    else:      return response.json()[0]['generated_text'] parameters = {    'max_new_tokens':25,  # number of generated tokens    'temperature': 0.5,   # controlling the randomness of generations    'end_sequence': "###" # stopping sequence for generation} prompt="...."             # few-shot prompt data = query(prompt,parameters,options)

Practical Insights

Here are some practical insights, which help you get started using GPT-Neo and the 🤗 Accelerated Inference API.

Since GPT-Neo (2.7B) is about 60x smaller than GPT-3 (175B), it does not generalize so well to zero-shot problems and needs 3-4 examples to achieve good results. When you provide more examples GPT-Neo understands the task and takes the end_sequence into account, which allows us to controlhttps://www.philschmid.de/static/blog/few-shot-learning-gpt-neo/insights-benefit-of-examples.png-learning-gpt-neo/insights-benefit-of-examples.png" alt="insights-benefit-of-examples">

The hyperparameter End Sequence, Token Length & Temperature can be used to control the text-generation of the model and you can use this to your advantage to solve the task you need. The Temperature controlls the randomness of your generations, lower temperature results in less random generations ahttps://www.philschmid.de/static/blog/few-shot-learning-gpt-neo/insights-benefit-of-hyperparameter.pngtatic/blog/few-shot-learning-gpt-neo/insights-benefit-of-hyperparameter.png" alt="insights-benefit-of-hyperparameter">

In the example, you can see how important it is to define your hyperparameter. These can make the difference between solving your task or failing miserably.


To use GPT-Neo or any Hugging Face model in your own application, you can start a free trial of the 🤗 Accelerated Inference API.If you need help mitigating bias in models and AI systems, or leveraging Few-Shot Learning, the 🤗 Expert Acceleration Program can offer your team direct premium support from the Hugging Face team.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Few-Shot Learning 机器学习 GPT-Neo 自然语言处理 🤗加速推理API
相关文章