TinyMS ResNet50 教程

在本教程中,我们会演示使用TinyMS API进行训练/推理一个ResNet50模型过程。

环境要求

  • Ubuntu: 18.04

  • Python: 3.7.x

  • Flask: 1.1.2

  • MindSpore: CPU-1.1.1

  • TinyMS: 0.1.0

  • numpy: 1.17.5

  • Pillow: 8.1.0

  • pip: 21.0.1

  • requests: 2.18.4

介绍

TinyMS是一个高级API,目的是让新手用户能够更加轻松地上手深度学习。TinyMS可以有效地减少用户在构建、训练、验证和推理一个模型过程中的操作次数。TinyMS也提供了教程和文档帮助开发者更好的上手和开发。

本教程中,包含6个步骤:构建模型下载数据集训练模型定义servable.json启动服务器推理,其中服务器在子进程中启动。

[1]:
import os
import json

from PIL import Image
from tinyms import context
from tinyms.serving import start_server, predict, list_servables, shutdown, server_started
from tinyms.data import Cifar10Dataset, download_dataset, ImageFolderDataset
from tinyms.vision import cifar10_transform, ImageViewer, imagefolder_transform
from tinyms.model import Model, resnet50
from tinyms.callbacks import ModelCheckpoint, CheckpointConfig, LossMonitor
from tinyms.metrics import Accuracy
from tinyms.optimizers import Momentum
from tinyms.losses import SoftmaxCrossEntropyWithLogits
[WARNING] ME(12653:139827079800640,MainProcess):2021-03-19-15:26:20.878.475 [mindspore/ops/operations/array_ops.py:2302] WARN_DEPRECATED: The usage of Pack is deprecated. Please use Stack.
WARNING: 'ControlDepend' is deprecated from version 1.1 and will be removed in a future version, use 'Depend' instead.

1. 构建模型

TinyMS封装了MindSpore ResNet50模型中的init和construct函数,代码行数能够大大减少,原有的大量代码段行数会被极限压缩:

[2]:
# 构建网络
net = resnet50(class_num=10)
model = Model(net)

2. 下载数据集

本教程演示了使用cifar10数据集进行训练,同时也提供两个预训练好的ckpt文件下载,一个是cifar10数据集训练得来,另一个是由ImageNet2012数据集中的蘑菇图片训练得来.

[3]:
# download the cifar10 dataset
cifar10_path = '/root/cifar10/cifar-10-batches-bin'
if not os.path.exists(cifar10_path):
    download_dataset('cifar10', '/root')
    print('************Download complete*************')
else:
    print('************Dataset already exists.**************')
************** Downloading the Cifar10 dataset **************
[███████████████████████████████████████████████████████████████████████████████████████████████████ ] 99.66%************Download complete*************

3. 训练模型

数据集中的训练集、验证集都会在此步骤中定义,同时也会定义训练参数。训练后生成的ckpt文件会保存到/etc/tinyms/serving/resnet50_cifar10文件夹以便后续使用,训练完成后会进行验证并输出 Accuracy指标。

提示:训练过程非常漫长,建议跳过训练步骤并直接下载、使用本教程提供的ckpt文件进行后续的推理
[ ]:
# 检查ckpt文件和路径
cifar10_ckpt_folder = '/etc/tinyms/serving/resnet50_cifar10'
cifar10_ckpt_path = '/etc/tinyms/serving/resnet50_cifar10/resnet50.ckpt'
if not os.path.exists(cifar10_ckpt_folder):
    !mkdir -p  /etc/tinyms/serving/resnet50_cifar10
else:
    print('resnet50_cifar10 ckpt folder already exists')

# 设置训练参数
epoch_size = 90 # default is 90
batch_size = 32

# 设置环境参数
dataset_sink_mode = False
device_target = "CPU"
context.set_context(mode=context.GRAPH_MODE, device_target=device_target)

# 设置数据集参数
train_dataset = Cifar10Dataset(cifar10_path, num_parallel_workers=4, shuffle=True)
train_dataset = cifar10_transform.apply_ds(train_dataset, repeat_size=1, batch_size=32, is_training=True)
eval_dataset = Cifar10Dataset(cifar10_path, num_parallel_workers=4, shuffle=True)
eval_dataset = cifar10_transform.apply_ds(eval_dataset, repeat_size=1, batch_size=32, is_training=False)
step_size = train_dataset.get_dataset_size()

save_checkpoint_epochs = 5
ckpoint_cb = ModelCheckpoint(prefix="resnet_cifar10", config=CheckpointConfig(
            save_checkpoint_steps=save_checkpoint_epochs * train_dataset.get_dataset_size(),
            keep_checkpoint_max=10))

# 定义loss函数
net_loss = SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean")

# 定义optimizer
net_opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), 0.01, 0.9)
model.compile(loss_fn=net_loss, optimizer=net_opt, metrics={"Accuracy": Accuracy()})


print('************************Start training*************************')
model.train(epoch_size, train_dataset, callbacks=[ckpoint_cb, LossMonitor()],dataset_sink_mode=dataset_sink_mode)
model.save_checkpoint(cifar10_ckpt_path)
print('************************Finished training*************************')

model.load_checkpoint(cifar10_ckpt_path)
print('************************Start evaluation*************************')
acc = model.eval(eval_dataset, dataset_sink_mode=dataset_sink_mode)
print("============== Accuracy:{} ==============".format(acc))
提示:如果跳过了训练步骤,下载预训练的ckpt文件并继续推理步骤

点击resnet_imagenet下载ResNet50_imagenet ckpt文件,或者点击resnet_cifar下载ResNet50_cifar ckpt文件,将ckpt文件保存到/etc/tinyms/serving/resnet50_<dataset_name>/resnet50.ckpt

或者运行以下代码下载resnet_imagenetresnet_cifar ckpt文件:

[4]:
# check lenet folder exists or not, and download resnet50_imagenet
imagenet2012_ckpt_folder = '/etc/tinyms/serving/resnet50_imagenet2012'
imagenet2012_ckpt_path = '/etc/tinyms/serving/resnet50_imagenet2012/resnet50.ckpt'
if not os.path.exists(imagenet2012_ckpt_folder):
    !mkdir -p  /etc/tinyms/serving/resnet50_imagenet2012
    !wget -P /etc/tinyms/serving/resnet50_imagenet2012 https://ascend-tutorials.obs.cn-north-4.myhuaweicloud.com/ckpt_files/imagenet2012/resnet50.ckpt
else:
    print('imagenet2012 ckpt folder already exists')
    if not os.path.exists(imagenet2012_ckpt_path):
        !wget -P /etc/tinyms/serving/resnet50_imagenet2012 https://ascend-tutorials.obs.cn-north-4.myhuaweicloud.com/ckpt_files/imagenet2012/resnet50.ckpt
    else:
        print('imagenet2012 ckpt file already exists')


# check lenet folder exists or not
cifar10_ckpt_folder = '/etc/tinyms/serving/resnet50_cifar10'
cifar10_ckpt_path = '/etc/tinyms/serving/resnet50_cifar10/resnet50.ckpt'
if not os.path.exists(cifar10_ckpt_folder):
    !mkdir -p  /etc/tinyms/serving/resnet50_cifar10
    !wget -P /etc/tinyms/serving/resnet50_cifar10 https://ascend-tutorials.obs.cn-north-4.myhuaweicloud.com/ckpt_files/cifar10/resnet50.ckpt
else:
    print('cifar10 ckpt folder already exists')
    if not os.path.exists(cifar10_ckpt_path):
        !wget -P /etc/tinyms/serving/resnet50_cifar10 https://ascend-tutorials.obs.cn-north-4.myhuaweicloud.com/ckpt_files/cifar10/resnet50.ckpt
    else:
        print('cifar10 ckpt file already exists')
imagenet2012 ckpt folder already exists
--2021-03-19 15:28:24--  https://ascend-tutorials.obs.cn-north-4.myhuaweicloud.com/ckpt_files/imagenet2012/resnet50.ckpt
Resolving ascend-tutorials.obs.cn-north-4.myhuaweicloud.com (ascend-tutorials.obs.cn-north-4.myhuaweicloud.com)... 49.4.112.113, 49.4.112.90, 49.4.112.5, ...
Connecting to ascend-tutorials.obs.cn-north-4.myhuaweicloud.com (ascend-tutorials.obs.cn-north-4.myhuaweicloud.com)|49.4.112.113|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 188521005 (180M) [binary/octet-stream]
Saving to: ‘/etc/tinyms/serving/resnet50_imagenet2012/resnet50.ckpt’

resnet50.ckpt       100%[===================>] 179.79M  32.4MB/s    in 6.0s

2021-03-19 15:28:31 (30.0 MB/s) - ‘/etc/tinyms/serving/resnet50_imagenet2012/resnet50.ckpt’ saved [188521005/188521005]

cifar10 ckpt folder already exists
--2021-03-19 15:28:31--  https://ascend-tutorials.obs.cn-north-4.myhuaweicloud.com/ckpt_files/cifar10/resnet50.ckpt
Resolving ascend-tutorials.obs.cn-north-4.myhuaweicloud.com (ascend-tutorials.obs.cn-north-4.myhuaweicloud.com)... 49.4.112.113, 49.4.112.90, 49.4.112.5, ...
Connecting to ascend-tutorials.obs.cn-north-4.myhuaweicloud.com (ascend-tutorials.obs.cn-north-4.myhuaweicloud.com)|49.4.112.113|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 188462121 (180M) [binary/octet-stream]
Saving to: ‘/etc/tinyms/serving/resnet50_cifar10/resnet50.ckpt’

resnet50.ckpt       100%[===================>] 179.73M  4.07MB/s    in 40s

2021-03-19 15:29:11 (4.52 MB/s) - ‘/etc/tinyms/serving/resnet50_cifar10/resnet50.ckpt’ saved [188462121/188462121]

4. 定义servable.json

在下列两段代码中选择其中一段运行以定义servable json文件,该文件会在后续推理中使用 运行下列代码定义ResNet50_imagenet2012模型servable json文件:

[ ]:
servable_json = [{'name': 'resnet50_imagenet2012',
                  'description': 'This servable hosts a resnet50 model predicting mushrooms',
                  'model': {
                      "name": "resnet50",
                      "format": "ckpt",
                      "class_num": 9}}]
os.chdir("/etc/tinyms/serving")
json_data = json.dumps(servable_json, indent=4)

with open('servable.json', 'w') as json_file:
    json_file.write(json_data)

运行下列代码定义ResNet50_cifar10模型servable json文件:

[5]:
servable_json = [{'name': 'resnet50_cifar10',
                  'description': 'This servable hosts a resnet50 model predicting 10 classes of objects',
                  'model': {
                      "name": "resnet50",
                      "format": "ckpt",
                      "class_num": 10}}]
os.chdir("/etc/tinyms/serving")
json_data = json.dumps(servable_json, indent=4)

with open('servable.json', 'w') as json_file:
    json_file.write(json_data)

5. 启动服务器

5.1 介绍

TinyMS推理是C/S(Client/Server)架构。TinyMS使用Flask这个轻量化的网页服务器架构作为C/S通讯的基础架构。为了能够对模型进行推理,用户必须首先启动服务器。如果成功启动,服务器会在子进程中运行并且会监听从地址127.0.0.1,端口号5000发送来的POST请求并且使用MindSpore作为后端来处理这些请求。后端会构建模型,运行推理并且返回结果给客户端

5.2 启动服务器

运行下列代码以启动服务器:

[6]:
start_server()
Server starts at host 127.0.0.1, port 5000

6. 推理

6.1 上传图片

ResNet50_imagenet2012模型需要用户上传一张蘑菇图片作为输入,而ResNet50_cifar10模型需要用户上传一张属于如下10个类别的图片作为输入:

['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

点击蘑菇下载本教程中使用的蘑菇图片以运行ResNet50_imagenet2012,或者点击飞机以运行ResNet50_cifar10。上传图片,如果使用命令行终端,可以使用’scp’或者’wget’获取图片,如果使用Jupyter,点击菜单右上方的’Upload’按钮并且选择上传的图片。将图片保存在根目录下,重命名为’mushroom.jpeg’或者’airplane.jpg’。

或者运行下列代码下载蘑菇图片(推理ResNet_imagenet模型)和飞机(推理ResNet_cifar模型)图片:

[7]:
# 下载蘑菇图片
if not os.path.exists('/root/mushroom.jpeg'):
    !wget -P /root/ https://ascend-tutorials.obs.cn-north-4.myhuaweicloud.com/tinyms-test-pics/mushrooms/mushroom.jpeg
else:
    print('mushroom.jpeg already exists')

# 下载飞机图片
if not os.path.exists('/root/airplane.jpg'):
    !wget -P /root/ https://ascend-tutorials.obs.cn-north-4.myhuaweicloud.com/tinyms-test-pics/objects/airplane.jpg
else:
    print('airplane.jpg already exists')
--2021-03-19 15:29:18--  https://ascend-tutorials.obs.cn-north-4.myhuaweicloud.com/tinyms-test-pics/mushrooms/mushroom.jpeg
Resolving ascend-tutorials.obs.cn-north-4.myhuaweicloud.com (ascend-tutorials.obs.cn-north-4.myhuaweicloud.com)... 49.4.112.113, 49.4.112.90, 49.4.112.5, ...
Connecting to ascend-tutorials.obs.cn-north-4.myhuaweicloud.com (ascend-tutorials.obs.cn-north-4.myhuaweicloud.com)|49.4.112.113|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 76020 (74K) [image/jpeg]
Saving to: ‘/root/mushroom.jpeg’

mushroom.jpeg       100%[===================>]  74.24K  --.-KB/s    in 0.1s

2021-03-19 15:29:18 (563 KB/s) - ‘/root/mushroom.jpeg’ saved [76020/76020]

--2021-03-19 15:29:18--  https://ascend-tutorials.obs.cn-north-4.myhuaweicloud.com/tinyms-test-pics/objects/airplane.jpg
Resolving ascend-tutorials.obs.cn-north-4.myhuaweicloud.com (ascend-tutorials.obs.cn-north-4.myhuaweicloud.com)... 49.4.112.113, 49.4.112.90, 49.4.112.5, ...
Connecting to ascend-tutorials.obs.cn-north-4.myhuaweicloud.com (ascend-tutorials.obs.cn-north-4.myhuaweicloud.com)|49.4.112.113|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 151188 (148K) [image/jpeg]
Saving to: ‘/root/airplane.jpg’

airplane.jpg        100%[===================>] 147.64K   568KB/s    in 0.3s

2021-03-19 15:29:19 (568 KB/s) - ‘/root/airplane.jpg’ saved [151188/151188]

6.2 List servables

使用list_servables函数检查当前后端的serving模型

[8]:
list_servables()
[8]:
[{'description': 'This servable hosts a resnet50 model predicting 10 classes of objects',
  'model': {'class_num': 10, 'format': 'ckpt', 'name': 'resnet50'},
  'name': 'resnet50_cifar10'}]

如果输出的description字段显示这是一个resnet50的模型,则可以继续到下一步发送推理请求,后续的代码会自动判断后端模型并运行推理代码

6.3 发送推理请求

运行predict函数发送推理请求,第4个参数选择TOP1_CLASSTOP5_CLASS以指定输出策略:

[9]:
#设置两张图片的路径(对应两个不同的数据集)和输出策略(可以在TOP1和TOP5中选择)
imagenet_image_path = "/root/mushroom.jpeg"
cifar_image_path = "/root/airplane.jpg"
strategy = "TOP1_CLASS"

# predict(image_path, servable_name, dataset, topk_strategy)
# predict方法的4个参数分别是图片路径、servable名称,数据集名称(默认MNIST,此处需手动指定)和输出策略
if server_started() is True:
    servable_name = list_servables()[0]['name']
    if servable_name == 'resnet50_imagenet2012':
        img_viewer = ImageViewer(Image.open(imagenet_image_path), imagenet_image_path)
        img_viewer.show()
        print(predict(imagenet_image_path, "resnet50_imagenet2012", "imagenet2012", strategy))
    else:
        img_viewer = ImageViewer(Image.open(cifar_image_path), cifar_image_path)
        img_viewer.show()
        print(predict(cifar_image_path, "resnet50_cifar10", 'cifar10', strategy))
else:
    print('Server not started')
../../_images/tutorials_ipynb_TinyMS_ResNet50_tutorial_zh_22_0.png
TOP1: airplane, score: 0.99997282028198242188

检查输出

如果用户运行ResNet50_imagenet2012且能看到类似如下输出:

TOP1: Amanita毒蝇伞,伞菌目,鹅膏菌科,鹅膏菌属,主要分布于我国黑龙江、吉林、四川、西藏、云南等地,有毒, score: 0.99119007587432861328

那么意味着已经进行了一次成功的推理

如果用户运行ResNet50_cifar10,输出应该类似于:

TOP1: airplane, score: 0.99997282028198242188

切换模型

如果想要体验使用ImageNet2012毒蘑菇数据集训练,可以运行下列代码,同时运行对应的servable_json代码段:

[ ]:
# download the imagenet2012 mushroom dataset
imagenet_path = '/root/mushrooms'
if not os.path.exists(imagenet_path):
    !wget -P /root/ https://ascend-tutorials.obs.cn-north-4.myhuaweicloud.com/resnet-50/mushrooms/mushrooms.zip
    !mkdir /root/mushrooms/
    !unzip /root/mushrooms.zip -d /root/mushrooms/
    print('************Download complete*************')
else:
    print('************Dataset already exists.**************')


# check ckpt folder exists or not
imagenet_ckpt_folder = '/etc/tinyms/serving/resnet50_imagenet2012'
imagenet_ckpt_path = '/etc/tinyms/serving/resnet50_imagenet2012/resnet50.ckpt'
if not os.path.exists(imagenet_ckpt_folder):
    !mkdir -p  /etc/tinyms/serving/resnet50_imagenet2012
else:
    print('resnet50_imagenet2012 ckpt folder already exists')


epoch_size = 90 # default is 90
batch_size = 32

# set environment parameters
dataset_sink_mode = False
device_target = "CPU"
context.set_context(mode=context.GRAPH_MODE, device_target=device_target)

# set dataset parameters
imagenet_train_path = '/root/mushrooms/train'
train_dataset = ImageFolderDataset(imagenet_train_path, num_parallel_workers=4, shuffle=True)
train_dataset = imagefolder_transform.apply_ds(train_dataset, repeat_size=1, batch_size=32, is_training=True)
imagenet_eval_path = '/root/mushrooms/eval'
eval_dataset = ImageFolderDataset(imagenet_eval_path, num_parallel_workers=4, shuffle=True)
eval_dataset = imagefolder_transform.apply_ds(eval_dataset, repeat_size=1, batch_size=32, is_training=False)
step_size = train_dataset.get_dataset_size()

save_checkpoint_epochs = 5
ckpoint_cb = ModelCheckpoint(prefix="resnet_imagenet2012", config=CheckpointConfig(
            save_checkpoint_steps=save_checkpoint_epochs * train_dataset.get_dataset_size(),
            keep_checkpoint_max=10))

# define the loss function
net_loss = SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean")

# define the optimizer
net_opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), 0.01, 0.9)
model.compile(loss_fn=net_loss, optimizer=net_opt, metrics={"Accuracy": Accuracy()})


print('************************Start training*************************')
model.train(epoch_size, train_dataset, callbacks=[ckpoint_cb, LossMonitor()],dataset_sink_mode=dataset_sink_mode)
model.save_checkpoint(imagenet_ckpt_path)
print('************************Finished training*************************')

model.load_checkpoint(imagenet_ckpt_path)
print('************************Start evaluation*************************')
acc = model.eval(eval_dataset, dataset_sink_mode=dataset_sink_mode)
print("============== Accuracy:{} ==============".format(acc))

关闭服务器

[10]:
shutdown()
[10]:
'Server shutting down...'