philschmid RSS feed 09月30日
AutoML工具简化机器学习开发
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

AutoML旨在帮助开发人员即使缺乏机器学习专业知识也能训练高质量模型。通过自动化特征选择和超参数调整等重复任务,AutoML(如Google的AutoML和AWS的Sagemaker Autopilot)使创建更高效、更准确的模型成为可能。本文介绍了如何使用AutoGluon构建对象检测模型,展示了AutoML在图像分类、文本分类和表格数据上的应用,并提供了基于Google Colab的实战教程。

AutoML通过自动化机器学习中的重复任务(如特征选择和超参数调整)简化模型开发流程,使缺乏专业知识的开发者也能训练高质量模型。

AutoGluon是一个开源AutoML库,支持图像分类、对象检测、文本分类和表格数据监督学习,仅需3行代码即可构建机器学习模型。

本文提供的Google Colab教程展示了如何使用AutoGluon构建水果检测模型,包括数据加载、模型训练(20个epoch,3个模型)、评估(测试集mAP达0.872)和预测功能。

AWS的Sagemaker Autopilot与Google AutoML类似,提供AutoML服务,而AutoGluon作为其背后的开源库,进一步推动了MLaaS(机器学习即服务)领域的发展。

Google CEO Sundar Pichai wrote, “... designing neuralnets is extremely time intensive, and requires an expertise that limits its use to a smaller community of scientists andengineers." Shortly after this Google launched its service AutoML in early 2018.

AutoML aims to enable developers with limited machine learning expertise to train high-quality models specific to theirbusiness needs. The goal of AutoML is to automate all the major repetitive tasks such asfeature selection orhyperparameter tuning. This allows creating more models inless time with improved quality and accuracy.

A basic two step approach to machine learning: First, the model is created by fitting it to the data. Second, the modelis used to predict the output for new (previously unseen) data.

This blog post demonstrates how to get started quickly with AutoML. It will give you a step-by-step tutorial on how tobuilt an Object Detection Model using AutoGluon, with top-notch accuracy. I created aGoogle Colab Notebook with a full example.


AWS is entering the field of AutoML

At Re:Invent 2019 AWS launched a bunch on add-ons for theremanaged machine Learning service Sagemaker amongst other"Sagemaker Autopilot". Sagemaker Autopilot is an AutoMLService comparable to Google AutoML service.

In January 2020 Amazon Web Services Inc. (AWS) secretly launched an open-source library calledAutoGluon the library behind Sagemaker Autopilot.

AutoGluon enables developers to write machine learning-based applications that use image, text or tabular data sets withjust a few lines ohttps://www.philschmid.de/static/blog/getting-started-with-automl-and-aws-autogluon/autogluon-sagemaker.pngn/autogluon-sagemaker.png" alt="sagemaker-and-autogluon">

With those tools, AWS has entered the field of managed AutoML Services or MLaas and to competeGoogle with its AutoML service.


What is AutoGluon?

"AutoGluon enables easy-to-use and easy-to-extend AutoML with a focus on deeplearning and real-world applications spanning image, text, or tabular data. Intended for both ML beginners and experts,AutoGluon enables you to... "

    quickly prototype deep learning solutionsautomatic hyperparameter tuning, model selection / architecture searchimprove existing bespoke models and data pipelines **

AutoGluon enables you to build machine learning models with only 3 Lines of Code.

from autogluon import TabularPrediction as task predictor = task.fit(train_data=task.Dataset(file_path="TRAIN_DATA.csv"), label="PREDICT_COLUMN")predictions = predictor.predict(task.Dataset(file_path="TEST_DATA.csv"))

Currently, AutoGluon can create models for image classification, object detection, text classification, and supervisedlearning with tabular datasets.

If you are interested in how AutoGluon is doing all the magic behind the scenes take a look at the"Machine learning with AutoGluon, an open source AutoML library"Post on the AWS Open Source Blog.


Tutorial

We are going to build an Object Detection Model, to detect fruits (apple, orange and banana) on images. I built a smalldataset with around 300 images to achieve a quick training process.You can find the dataset here.

I am using Google Colab with a GPU runtime for this tutorial. If you are not sure how to use a GPU Runtime take a lookhere.

Okay, now let's get started with the tutorial.


Installing AutoGluon

AutoGluon offers different installation packages for different hardware preferences. For more installation instructionstake a look at the AutoGluon Installation Guide here.

The first step is to install AutoGluon with pip and CUDA support.

# Here we assume CUDA 10.0 is installed.  You should change the number# according to your own CUDA version (e.g. mxnet-cu101 for CUDA 10.1).!pip install --upgrade mxnet-cu100!pip install autogluon

For AutoGluon to work in Google Colab, we also have to install ipykernel and restart the runtime.

    !pip install -U ipykernel

After a successful restart of your runtime you can import autogluon and print out the version.

import autogluon as agfrom autogluon import ObjectDetection as task print(ag.__version__) # >>>> '0.0.6'

Loading data and creating datasets

The next step is to load the dataset we use for the object detection task. In the ObjectDetection task from AutoGluon,you can either use PASCAL VOC format or the COCO format by adjusting the format parameter of Dataset() to eithercoco or voc. ThePascal VOC Dataset containstwo directories: Annotations and JPEGImages. TheCOCO dataset isformatted in JSON and is a collection of "info", "licenses", "images", "annotations", "categories".

For training, we are going to use thetiny_fruit_object_detection dataset, which Ibuild. The Dataset contains around 300 images of bananas, apples, oranges or a combination of them together.

We are using 240 images for training, 30 for testing https://www.philschmid.de/static/blog/getting-started-with-automl-and-aws-autogluon/sample-images.pnged-with-automl-and-aws-autogluon/sample-images.png" alt="sample-images">

Using the commands below, we can download and unzip this dataset, which is only 29MB. After this we create ourDataset for train and test with task.Dataset().

# download the dataroot = './'filename_zip = ag.download('https://philschmid-datasets.s3.eu-central-1.amazonaws.com/tiny_fruit.zip',                        path=root)# unzip datafilename = ag.unzip(filename_zip, root=root) # create Datasetdata_root = os.path.join(root, filename)# train datasetdataset_train = task.Dataset(data_root, classes=('banana','apple','orange'),format='voc')# test datasetdataset_test = task.Dataset(data_root, index_file_name='test', classes=('banana','apple','orange'),format='voc')

Training the Model

The third step is to train our model with the created dataset. In AutoGluon you define your classifier as variable,here detector and define parameters in the fit() function during train-time. For example, you can define atime_limit which automatically stops the training after a certain time. You can define a range for your ownlearning_rate or set the number of epochs. One of the most important parameters is num_trials. This parameterdefines the maximum number of hyperparameter configurations to try out. You can find the full documentation of theconfigurable parameters here.

We are going to train our model for 20 epochs and train 3 different models by setting num_trials=3.

from autogluon import ObjectDetection as task epochs = 20detector = task.fit(dataset_train,                    num_trials=3,                    epochs=epochs,                    lr=ag.Categorical(5e-4, 1e-4, 3e-4),                                        ngpus_per_trial=1)

As a result, we are getting a chart with the mean average precision (mAP) and the number of epochs. The mAP is a commonmetric to calculate the accuracy of an object detection model.

Our best model (bluehttps://www.philschmid.de/static/blog/getting-started-with-automl-and-aws-autogluon/result-chart.png/static/blog/getting-started-with-automl-and-aws-autogluon/result-chart.png" alt="result-chart">

Evaluating the Model

After finishing training, we are now going to evaluate/test the performance of our model on our test dataset

test_map = detector.evaluate(dataset_test)print(f"The mAP on the test dataset is {test_map[1][1]}")

The mAP on the test dataset is 0.8724113724113725 which is pretty good considering we only training with 240 Imagesand 20 epochs.


Predict an Image

To use our trained model for predicting you can simply run detector.predict(image_path), which will return a tuple(ind) containing the class-IDs of detected objects, the confidence-scores (prob), and the corresponding predictedbounding box locations (loc).

image = 'mixed_10.jpg'image_path = os.path.join(data_root, 'JPEGImages', image) ihttps://www.philschmid.dehttps://www.philschmid.de/static/blog/getting-started-with-automl-and-aws-autogluon/predict-result.png

Save Model

As of the time writing this article, saving an object detection model is not yet implemented in version 0.0.6, butwill be in the next deployed version.

To save your model, you only have to run detector.save()

savefile = 'model.pkl'detector.save(savefile)

Load Model

As of the time writing this article, loading an object detection model is not yet implemented in version 0.0.6, butwill be in the next deployed version.

from autogluon import Detectornew_detector = Detector.load(savefile) image = 'mixed_17.jpg'image_path = os.path.join(data_root, 'JPEGImages', image) detector.predict(image_path)

Thanks for reading. You can find theGoogle Colab Notebook containing a fullexample here.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AutoML AutoGluon 机器学习 Sagemaker Autopilot 对象检测 Google Colab
相关文章