tinyms.callbacks

Callback related classes and functions in model training phase.

class tinyms.callbacks.LossTimeMonitor(lr_init=None)[源代码]

Monitor loss and time.

参数:

lr_init (numpy.ndarray) – Train learning rate. Default: None.

返回:

None

实际案例

>>> from tinyms import Tensor
>>> from tinyms.callbacks import LossTimeMonitor
>>>
>>> LossTimeMonitor(lr_init=Tensor([0.05] * 100).asnumpy())
class tinyms.callbacks.LossTimeMonitorV2[源代码]

Monitor loss and time version 2.0. This version will not show learning rate.

Args:

返回:

None

实际案例

>>> from tinyms.callbacks import LossTimeMonitorV2
>>>
>>> LossTimeMonitorV2()
class tinyms.callbacks.BertLossCallBack(dataset_size=1)[源代码]

Monitor the loss in training. If the loss in NAN or INF terminating training.

参数:

dataset_size (int) – Print loss every times. Default: 1.

返回:

None

实际案例

>>> from tinyms.callbacks import BertLossCallBack
>>>
>>> BertLossCallBack(dataset_size=1)
step_end(run_context)[源代码]

Print loss after each step

class tinyms.callbacks.Callback[源代码]

Abstract base class used to build a Callback class. Callbacks are context managers which will be entered and exited when passing into the Model. You can use this mechanism to do some custom operations.

Each method of Callback class corresponds to a stage in training or eval process, and those methods have the same input run_context, which hold context information of the model in training or eval process. When defining a Callback subclass or creating a custom Callback, note that you should override methods with names prefixed with “on_train” or “on_eval”, otherwise ValueError will be raised if the custimized Callbacks used in model.fit.

When creating a custom Callback, model context information can be obtained in Callback methods by calling RunContext.original_args(), which is a dictionary varivable recording current attributes. Users can add custimized attributes to the information. Training process can also be stopped by calling request_stop method. For details of custom Callback, please check Callback.

实际案例

>>> import numpy as np
>>> from mindspore import nn
>>> from mindspore import dataset as ds
>>> from mindspore.train import Model, Callback
>>> class Print_info(Callback):
...     def step_end(self, run_context):
...         cb_params = run_context.original_args()
...         print("step_num: ", cb_params.cur_step_num)
>>>
>>> print_cb = Print_info()
>>> data = {"x": np.float32(np.random.rand(64, 10)), "y": np.random.randint(0, 5, (64,))}
>>> dataset = ds.NumpySlicesDataset(data=data).batch(32)
>>> net = nn.Dense(10, 5)
>>> loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
>>> optim = nn.Momentum(net.trainable_params(), 0.01, 0.9)
>>> model = Model(net, loss_fn=loss, optimizer=optim)
>>> model.train(1, dataset, callbacks=print_cb)
step_num: 2
begin(run_context)[源代码]

Called once before the network executing. A backwards compatibility alias for on_train_begin and on_eval_begin.

参数:

run_context (RunContext) – Include some information of the model.

end(run_context)[源代码]

Called once after network training. A backwards compatibility alias for on_train_end and on_eval_end.

参数:

run_context (RunContext) – Include some information of the model.

epoch_begin(run_context)[源代码]

Called before each epoch beginning. A backwards compatibility alias for on_train_epoch_begin and on_eval_epoch_begin.

参数:

run_context (RunContext) – Include some information of the model.

epoch_end(run_context)[源代码]

Called after each epoch finished. A backwards compatibility alias for on_train_epoch_end and on_eval_epoch_end.

参数:

run_context (RunContext) – Include some information of the model.

on_eval_begin(run_context)[源代码]

Called before eval begin.

参数:

run_context (RunContext) – Include some information of the model.

on_eval_end(run_context)[源代码]

Called after eval end.

参数:

run_context (RunContext) – Include some information of the model.

on_eval_epoch_begin(run_context)[源代码]

Called before eval epoch begin.

参数:

run_context (RunContext) – Include some information of the model.

on_eval_epoch_end(run_context)[源代码]

Called after eval epoch end.

参数:

run_context (RunContext) – Include some information of the model.

on_eval_step_begin(run_context)[源代码]

Called before each eval step begin.

参数:

run_context (RunContext) – Include some information of the model.

on_eval_step_end(run_context)[源代码]

Called after each eval step end.

参数:

run_context (RunContext) – Include some information of the model.

on_train_begin(run_context)[源代码]

Called once before the network training.

参数:

run_context (RunContext) – Include some information of the model.

on_train_end(run_context)[源代码]

Called after training end.

参数:

run_context (RunContext) – Include some information of the model.

on_train_epoch_begin(run_context)[源代码]

Called before each training epoch begin.

参数:

run_context (RunContext) – Include some information of the model.

on_train_epoch_end(run_context)[源代码]

Called after each training epoch end.

参数:

run_context (RunContext) – Include some information of the model.

on_train_step_begin(run_context)[源代码]

Called before each training step begin.

参数:

run_context (RunContext) – Include some information of the model.

on_train_step_end(run_context)[源代码]

Called after each training step end.

参数:

run_context (RunContext) – Include some information of the model.

step_begin(run_context)[源代码]

Called before each step beginning. A backwards compatibility alias for on_train_step_begin and on_eval_step_begin.

参数:

run_context (RunContext) – Include some information of the model.

step_end(run_context)[源代码]

Called after each step finished. A backwards compatibility alias for on_train_step_end and on_eval_step_end.

参数:

run_context (RunContext) – Include some information of the model.

class tinyms.callbacks.LossMonitor(per_print_times=1)[源代码]

Monitor the loss in train or monitor the loss and eval metrics in fit.

If the loss is NAN or INF, it will terminate training.

注解

If per_print_times is 0, do not print loss.

参数:

per_print_times (int) – How many steps to print once loss. During sink mode, it will print loss in the nearest step. Default: 1.

引发:

ValueError – If per_print_times is not an integer or less than zero.

实际案例

注解

Before running the following example, you need to customize the network LeNet5 and dataset preparation function create_dataset. Refer to Building a Network and Dataset .

>>> from mindspore import nn
>>> from mindspore.train import Model, LossMonitor
>>>
>>> net = LeNet5()
>>> loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
>>> optim = nn.Momentum(net.trainable_params(), 0.01, 0.9)
>>> model = Model(net, loss_fn=loss, optimizer=optim)
>>> data_path = './MNIST_Data'
>>> dataset = create_dataset(data_path)
>>> loss_monitor = LossMonitor()
>>> model.train(10, dataset, callbacks=loss_monitor)
on_train_epoch_end(run_context)[源代码]

When LossMonitor used in model.fit, print eval metrics at the end of epoch if current epoch should do evaluation.

参数:

run_context (RunContext) – Include some information of the model. For more details, please refer to mindspore.train.RunContext.

step_end(run_context)[源代码]

Print training loss at the end of step.

参数:

run_context (RunContext) – Include some information of the model. For more details, please refer to mindspore.train.RunContext.

class tinyms.callbacks.TimeMonitor(data_size=None)[源代码]

Monitor the time in train or eval process.

参数:

data_size (int) – How many steps are the intervals between print information each time. if the program get batch_num during training, data_size will be set to batch_num, otherwise data_size will be used. Default: None.

引发:

ValueError – If data_size is not positive int.

实际案例

注解

Before running the following example, you need to customize the network LeNet5 and dataset preparation function create_dataset. Refer to Building a Network and Dataset .

>>> from mindspore import nn
>>> from mindspore.train import Model, TimeMonitor
>>>
>>> net = LeNet5()
>>> loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
>>> optim = nn.Momentum(net.trainable_params(), 0.01, 0.9)
>>> model = Model(net, loss_fn=loss, optimizer=optim)
>>> data_path = './MNIST_Data'
>>> dataset = create_dataset(data_path)
>>> time_monitor = TimeMonitor()
>>> model.train(10, dataset, callbacks=time_monitor)
epoch_begin(run_context)[源代码]

Record time at the beginning of epoch.

参数:

run_context (RunContext) – Context of the process running. For more details, please refer to mindspore.train.RunContext.

epoch_end(run_context)[源代码]

Print process cost time at the end of epoch.

参数:

run_context (RunContext) – Context of the process running. For more details, please refer to mindspore.train.RunContext.

class tinyms.callbacks.ModelCheckpoint(prefix='CKP', directory=None, config=None)[源代码]

The checkpoint callback class.

It is called to combine with train process and save the model and network parameters after training.

注解

In the distributed training scenario, please specify different directories for each training process to save the checkpoint file. Otherwise, the training may fail. If this callback is used in the model function, the checkpoint file will saved parameters of the optimizer by default.

参数:
  • prefix (str) – The prefix name of checkpoint files. Default: “CKP”.

  • directory (str) – The path of the folder which will be saved in the checkpoint file. By default, the file is saved in the current directory. Default: None.

  • config (CheckpointConfig) – Checkpoint strategy configuration. Default: None.

引发:
  • ValueError – If prefix is not str or contains the ‘/’ character.

  • ValueError – If directory is not str.

  • TypeError – If the config is not CheckpointConfig type.

end(run_context)[源代码]

Save the last checkpoint after training finished.

参数:

run_context (RunContext) – Context of the train running.

property latest_ckpt_file_name

Return the latest checkpoint path and file name.

step_end(run_context)[源代码]

Save the checkpoint at the end of step.

参数:

run_context (RunContext) – Context of the train running.

class tinyms.callbacks.SummaryCollector(summary_dir, collect_freq=10, collect_specified_data=None, keep_default_action=True, custom_lineage_data=None, collect_tensor_freq=None, max_file_size=None, export_options=None)[源代码]

SummaryCollector can help you to collect some common information.

It can help you to collect loss, learning late, computational graph and so on. SummaryCollector also enables the summary operator to collect data to summary files.

注解

  1. When using SummaryCollector, you need to run the code in if __name__ == “__main__” .

  2. Multiple SummaryCollector instances in callback list are not allowed.

  3. Not all information is collected at the training phase or at the eval phase.

  4. SummaryCollector always record the data collected by the summary operator.

  5. SummaryCollector only supports Linux systems.

  6. The Summary is not supported when compile source with -s on option.

参数:
  • summary_dir (str) – The collected data will be persisted to this directory. If the directory does not exist, it will be created automatically.

  • collect_freq (int) – Set the frequency of data collection, it should be greater than zero, and the unit is step. If a frequency is set, we will collect data when (current steps % freq) equals to 0, and the first step will be collected at any time. It is important to note that if the data sink mode is used, the unit will become the epoch. It is not recommended to collect data too frequently, which can affect performance. Default: 10.

  • collect_specified_data (Union[None, dict]) –

    Perform custom operations on the collected data. By default, if set to None, all data is collected as the default behavior. You can customize the collected data with a dictionary. For example, you can set {‘collect_metric’: False} to control not collecting metrics. The data that supports control is shown below. Default: None.

    • collect_metric (bool): Whether to collect training metrics, currently only the loss is collected. The first output will be treated as the loss and it will be averaged. Default: True.

    • collect_graph (bool): Whether to collect the computational graph. Currently, only training computational graph is collected. Default: True.

    • collect_train_lineage (bool): Whether to collect lineage data for the training phase, this field will be displayed on the lineage page of MindInsight. Default: True.

    • collect_eval_lineage (bool): Whether to collect lineage data for the evaluation phase, this field will be displayed on the lineage page of MindInsight. Default: True.

    • collect_input_data (bool): Whether to collect dataset for each training. Currently only image data is supported. If there are multiple columns of data in the dataset, the first column should be image data. Default: True.

    • collect_dataset_graph (bool): Whether to collect dataset graph for the training phase. Default: True.

    • histogram_regular (Union[str, None]): Collect weight and bias for parameter distribution page and displayed in MindInsight. This field allows regular strings to control which parameters to collect. It is not recommended to collect too many parameters at once, as it can affect performance. Note that if you collect too many parameters and run out of memory, the training will fail. Default: None, it means only the first five parameters are collected.

    • collect_landscape (Union[dict,None]): Whether to collect the parameters needed to create the loss landscape. If set to None, collect_landscape parameters will not be collected. All parameter information is collected by default and stored in file {summary_dir}/ckpt_dir/train_metadata.json.

      • landscape_size (int): Specify the image resolution of the generated loss landscape. For example, if it is set to 128, the resolution of the landscape is 128 * 128. The calculation time increases with the increase of resolution. Default: 40. Optional values: between 3 and 256.

      • unit (str): Specify the interval strength of the training process. Default: “step”. Optional: epoch/step.

      • create_landscape (dict): Select how to create loss landscape. Training process loss landscape(train) and training result loss landscape(result). Default: {“train”: True, “result”: True}. Optional: True/False.

      • num_samples (int): The size of the dataset used to create the loss landscape. For example, in image dataset, You can set num_samples is 128, which means that 128 images are used to create loss landscape. Default: 128.

      • intervals (List[List[int]]): Specifies the interval in which the loss landscape. For example: If the user wants to create loss landscape of two training processes, they are 1-5 epoch and 6-10 epoch respectively. They anc set [[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]. Note: Each interval have at least three epochs.

  • keep_default_action (bool) – This field affects the collection behavior of the ‘collect_specified_data’ field. True: it means that after specified data is set, non-specified data is collected as the default behavior. False: it means that after specified data is set, only the specified data is collected, and the others are not collected. Default: True.

  • custom_lineage_data (Union[dict, None]) – Allows you to customize the data and present it on the MingInsight lineage page. In the custom data, the type of the key supports str, and the type of value supports str, int and float. Default: None, it means there is no custom data.

  • collect_tensor_freq (Optional[int]) – The same semantics as the collect_freq, but controls TensorSummary only. Because TensorSummary data is too large to be compared with other summary data, this parameter is used to reduce its collection. By default, The maximum number of steps for collecting TensorSummary data is 20, but it will not exceed the number of steps for collecting other summary data. For example, given collect_freq=10, when the total steps is 600, TensorSummary will be collected 20 steps, while other summary data 61 steps, but when the total steps is 20, both TensorSummary and other summary will be collected 3 steps. Also note that when in parallel mode, the total steps will be split evenly, which will affect the number of steps TensorSummary will be collected. Default: None, which means to follow the behavior as described above.

  • max_file_size (Optional[int]) – The maximum size in bytes of each file that can be written to the disk. For example, to write not larger than 4GB, specify max_file_size=4*1024**3. Default: None, which means no limit.

  • export_options (Union[None, dict]) –

    Perform custom operations on the export data. Note that the size of export files is not limited by the max_file_size. You can customize the export data with a dictionary. For example, you can set {‘tensor_format’: ‘npy’} to export tensor as npy file. The data that supports control is shown below. Default: None, it means that the data is not exported.

    • tensor_format (Union[str, None]): Customize the export tensor format. Supports [“npy”, None]. Default: None, it means that the tensor is not exported.

      • npy: export tensor as npy file.

引发:

ValueError – The Summary is not supported, please without -s on and recompile source.

实际案例

>>> import mindspore as ms
>>> import mindspore.nn as nn
>>> from mindspore.train import Model, SummaryCollector
>>> from mindspore.nn import Accuracy
>>>
>>> if __name__ == '__main__':
...     # If the device_target is GPU, set the device_target to "GPU"
...     ms.set_context(mode=ms.GRAPH_MODE, device_target="Ascend")
...     mnist_dataset_dir = '/path/to/mnist_dataset_directory'
...     # The detail of create_dataset method shown in model_zoo.official.cv.lenet.src.dataset.py
...     ds_train = create_dataset(mnist_dataset_dir, 32)
...     # The detail of LeNet5 shown in model_zoo.official.cv.lenet.src.lenet.py
...     network = LeNet5(10)
...     net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean")
...     net_opt = nn.Momentum(network.trainable_params(), 0.01, 0.9)
...     model = Model(network, net_loss, net_opt, metrics={"Accuracy": Accuracy()}, amp_level="O2")
...
...     # Simple usage:
...     summary_collector = SummaryCollector(summary_dir='./summary_dir')
...     model.train(1, ds_train, callbacks=[summary_collector], dataset_sink_mode=False)
...
...     # Do not collect metric and collect the first layer parameter, others are collected by default
...     specified={'collect_metric': False, 'histogram_regular': '^conv1.*'}
...     summary_collector = SummaryCollector(summary_dir='./summary_dir', collect_specified_data=specified)
...     model.train(1, ds_train, callbacks=[summary_collector], dataset_sink_mode=False)
class tinyms.callbacks.CheckpointConfig(save_checkpoint_steps=1, save_checkpoint_seconds=0, keep_checkpoint_max=5, keep_checkpoint_per_n_minutes=0, integrated_save=True, async_save=False, saved_network=None, append_info=None, enc_key=None, enc_mode='AES-GCM', exception_save=False)[源代码]

The configuration of model checkpoint.

注解

During the training process, if dataset is transmitted through the data channel, it is suggested to set ‘save_checkpoint_steps’ to an integer multiple of loop_size. Otherwise, the time to save the checkpoint may be biased. It is recommended to set only one save strategy and one keep strategy at the same time. If both save_checkpoint_steps and save_checkpoint_seconds are set, save_checkpoint_seconds will be invalid. If both keep_checkpoint_max and keep_checkpoint_per_n_minutes are set, keep_checkpoint_per_n_minutes will be invalid.

参数:
  • save_checkpoint_steps (int) – Steps to save checkpoint. Default: 1.

  • save_checkpoint_seconds (int) – Seconds to save checkpoint. Can’t be used with save_checkpoint_steps at the same time. Default: 0.

  • keep_checkpoint_max (int) – Maximum number of checkpoint files can be saved. Default: 5.

  • keep_checkpoint_per_n_minutes (int) – Save the checkpoint file every keep_checkpoint_per_n_minutes minutes. Can’t be used with keep_checkpoint_max at the same time. Default: 0.

  • integrated_save (bool) – Whether to merge and save the split Tensor in the automatic parallel scenario. Integrated save function is only supported in automatic parallel scene, not supported in manual parallel. Default: True.

  • async_save (bool) – Whether asynchronous execution saves the checkpoint to a file. Default: False.

  • saved_network (Cell) – Network to be saved in checkpoint file. If the saved_network has no relation with the network in training, the initial value of saved_network will be saved. Default: None.

  • append_info (list) – The information save to checkpoint file. Support “epoch_num”, “step_num” and dict. The key of dict must be str, the value of dict must be one of int, float, bool, Parameter or Tensor. Default: None

  • enc_key (Union[None, bytes]) – Byte type key used for encryption. If the value is None, the encryption is not required. Default: None.

  • enc_mode (str) – This parameter is valid only when enc_key is not set to None. Specifies the encryption mode, currently supports ‘AES-GCM’, ‘AES-CBC’ and ‘SM4-CBC’. Default: ‘AES-GCM’.

  • exception_save (bool) – Whether to save the current checkpoint when an exception occurs. Default: False.

引发:

ValueError – If input parameter is not the correct type.

实际案例

注解

Before running the following example, you need to customize the network LeNet5 and dataset preparation function create_dataset. Refer to Building a Network and Dataset .

>>> from mindspore import nn
>>> from mindspore.common.initializer import Normal
>>> from mindspore.train import Model, CheckpointConfig, ModelCheckpoint
>>>
>>> class LeNet5(nn.Cell):
...     def __init__(self, num_class=10, num_channel=1):
...         super(LeNet5, self).__init__()
...         self.conv1 = nn.Conv2d(num_channel, 6, 5, pad_mode='valid')
...         self.conv2 = nn.Conv2d(6, 16, 5, pad_mode='valid')
...         self.fc1 = nn.Dense(16 * 5 * 5, 120, weight_init=Normal(0.02))
...         self.fc2 = nn.Dense(120, 84, weight_init=Normal(0.02))
...         self.fc3 = nn.Dense(84, num_class, weight_init=Normal(0.02))
...         self.relu = nn.ReLU()
...         self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2)
...         self.flatten = nn.Flatten()
...
...     def construct(self, x):
...         x = self.max_pool2d(self.relu(self.conv1(x)))
...         x = self.max_pool2d(self.relu(self.conv2(x)))
...         x = self.flatten(x)
...         x = self.relu(self.fc1(x))
...         x = self.relu(self.fc2(x))
...         x = self.fc3(x)
...         return x
>>>
>>> net = LeNet5()
>>> loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
>>> optim = nn.Momentum(net.trainable_params(), 0.01, 0.9)
>>> model = Model(net, loss_fn=loss, optimizer=optim)
>>> data_path = './MNIST_Data'
>>> dataset = create_dataset(data_path)
>>> config = CheckpointConfig(saved_network=net)
>>> ckpoint_cb = ModelCheckpoint(prefix='LeNet5', directory='./checkpoint', config=config)
>>> model.train(10, dataset, callbacks=ckpoint_cb)
property append_dict

Get the value of information dict saved to checkpoint file.

返回:

Dict, the information saved to checkpoint file.

property async_save

Get the value of whether asynchronous execution saves the checkpoint to a file.

返回:

Bool, whether asynchronous execution saves the checkpoint to a file.

property enc_key

Get the value of byte type key used for encryption.

返回:

(None, bytes), byte type key used for encryption.

property enc_mode

Get the value of the encryption mode.

返回:

str, encryption mode.

get_checkpoint_policy()[源代码]

Get the policy of checkpoint.

返回:

Dict, the information of checkpoint policy.

property integrated_save

Get the value of whether to merge and save the split Tensor in the automatic parallel scenario.

返回:

Bool, whether to merge and save the split Tensor in the automatic parallel scenario.

property keep_checkpoint_max

Get the value of maximum number of checkpoint files can be saved.

返回:

Int, Maximum number of checkpoint files can be saved.

property keep_checkpoint_per_n_minutes

Get the value of save the checkpoint file every n minutes.

返回:

Int, save the checkpoint file every n minutes.

property save_checkpoint_seconds

Get the value of _save_checkpoint_seconds.

返回:

Int, seconds to save the checkpoint file.

property save_checkpoint_steps

Get the value of steps to save checkpoint.

返回:

Int, steps to save checkpoint.

property saved_network

Get the value of network to be saved in checkpoint file.

返回:

Cell, network to be saved in checkpoint file.

class tinyms.callbacks.RunContext(original_args)[源代码]

Hold and manage information about the model.

RunContext is mainly used to collect context-related information about the model during training or eval and pass it into the Callback object as an input parameter to share information.

Callback objects not only can obtain the Model context information by calling by RunContext.original_args() and add extra attributes to the information, but also can stop the training process by calling request_stop method. For details of custom Callback, please check Callback.

RunContext.original_args() holds the model context information as a dictionary variable, and different attributes of the dictionary are stored in training or eval process. Details are as follows:

Attributes supported in train

Attributes supported in eval

meaning

train_network

train network with optimizer and loss

epoch_num

Number of train epochs

train_dataset

the train dataset

loss_fn

the loss function

optimizer

the optimizer

parallel_mode

the parallel mode

device_number

the device number

train_dataset_element

the train data element of current step

last_save_ckpt_step

the last step num of save ckpt

latest_ckpt_file

the ckpt file

cur_epoch_num

number of current epoch

eval_network

the evaluate network

valid_dataset

the valid dataset

metrics

the evaluate metrics

mode

mode

“train” or “eval”

batch_num

batch_num

the train/eval batch number

list_callback

list_callback

callback list

network

network

basic network

cur_step_num

cur_step_num

the train/eval step number

dataset_sink_mode

dataset_sink_mode

the train/eval sink mode

net_outputs

net_outputs

network output results

参数:

original_args (dict) – Holding the related information of model.

get_stop_requested()[源代码]

Return whether a stop is requested or not.

返回:

bool, if true, model.train() stops iterations.

original_args()[源代码]

Get the _original_args object.

返回:

Dict, an object that holds the original arguments of model.

request_stop()[源代码]

Set stop requirement during training or eval.

Callbacks can use this function to request stop of iterations. model.train() checks whether this is called or not.

class tinyms.callbacks.LearningRateScheduler(learning_rate_function)[源代码]

Change the learning_rate during training.

参数:

learning_rate_function (Function) – The function about how to change the learning rate during training.

实际案例

>>> import numpy as np
>>> from mindspore import nn
>>> from mindspore.train import Model, LearningRateScheduler
>>> from mindspore import dataset as ds
...
>>> def learning_rate_function(lr, cur_step_num):
...     if cur_step_num%1000 == 0:
...         lr = lr*0.1
...     return lr
...
>>> lr = 0.1
>>> momentum = 0.9
>>> net = nn.Dense(10, 5)
>>> loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
>>> optim = nn.Momentum(net.trainable_params(), learning_rate=lr, momentum=momentum)
>>> model = Model(net, loss_fn=loss, optimizer=optim)
...
>>> data = {"x": np.float32(np.random.rand(64, 10)), "y": np.random.randint(0, 5, (64,))}
>>> dataset = ds.NumpySlicesDataset(data=data).batch(32)
>>> model.train(1, dataset, callbacks=[LearningRateScheduler(learning_rate_function)],
...             dataset_sink_mode=False)
step_end(run_context)[源代码]

Change the learning_rate at the end of step.

参数:

run_context (RunContext) – Include some information of the model.

class tinyms.callbacks.SummaryLandscape(summary_dir)[源代码]

SummaryLandscape can help you to collect loss landscape information. It can create landscape in PCA direction or random direction by calculating loss.

注解

  1. When using SummaryLandscape, you need to run the code in if __name__ == “__main__” .

  2. SummaryLandscape only supports Linux systems.

参数:

summary_dir (str) – The path of summary is used to save the model weight, metadata and other data required to create landscape.

实际案例

>>> import mindspore as ms
>>> import mindspore.nn as nn
>>> from mindspore.nn import Loss, Accuracy
>>> from mindspore.train import Model, SummaryCollector, SummaryLandscape
>>>
>>> if __name__ == '__main__':
...     # If the device_target is Ascend, set the device_target to "Ascend"
...     ms.set_context(mode=ms.GRAPH_MODE, device_target="GPU")
...     mnist_dataset_dir = '/path/to/mnist_dataset_directory'
...     # The detail of create_dataset method shown in model_zoo.official.cv.lenet.src.dataset.py
...     ds_train = create_dataset(mnist_dataset_dir, 32)
...     # The detail of LeNet5 shown in model_zoo.official.cv.lenet.src.lenet.py
...     network = LeNet5(10)
...     net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean")
...     net_opt = nn.Momentum(network.trainable_params(), 0.01, 0.9)
...     model = Model(network, net_loss, net_opt, metrics={"Accuracy": Accuracy()})
...     # Simple usage for collect landscape information:
...     interval_1 = [1, 2, 3, 4, 5]
...     summary_collector = SummaryCollector(summary_dir='./summary/lenet_interval_1',
...                                          collect_specified_data={'collect_landscape':{"landscape_size": 4,
...                                                                                        "unit": "step",
...                                                                          "create_landscape":{"train":True,
...                                                                                             "result":False},
...                                                                          "num_samples": 2048,
...                                                                          "intervals": [interval_1]}
...                                                                    })
...     model.train(1, ds_train, callbacks=[summary_collector], dataset_sink_mode=False)
...
...     # Simple usage for visualization landscape:
...     def callback_fn():
...         network = LeNet5(10)
...         net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean")
...         metrics = {"Loss": Loss()}
...         model = Model(network, net_loss, metrics=metrics)
...         mnist_dataset_dir = '/path/to/mnist_dataset_directory'
...         ds_eval = create_dataset(mnist_dataset_dir, 32)
...         return model, network, ds_eval, metrics
...
...     summary_landscape = SummaryLandscape('./summary/lenet_interval_1')
...     # parameters of collect_landscape can be modified or unchanged
...     summary_landscape.gen_landscapes_with_multi_process(callback_fn,
...                                                        collect_landscape={"landscape_size": 4,
...                                                                         "create_landscape":{"train":False,
...                                                                                            "result":False},
...                                                                          "num_samples": 2048,
...                                                                          "intervals": [interval_1]},
...                                                         device_ids=[1])
clean_ckpt()[源代码]

Clean the checkpoint.

gen_landscapes_with_multi_process(callback_fn, collect_landscape=None, device_ids=None, output=None)[源代码]

Use the multi process to generate landscape.

参数:
  • callback_fn (python function) –

    A python function object. User needs to write a function, it has no input, and the return requirements are as follows.

    • mindspore.train.Model: User’s model object.

    • mindspore.nn.Cell: User’s network object.

    • mindspore.dataset: User’s dataset object for create loss landscape.

    • mindspore.train.Metrics: User’s metrics object.

  • collect_landscape (Union[dict, None]) –

    The meaning of the parameters when creating loss landscape is consistent with the fields with the same name in SummaryCollector. The purpose of setting here is to allow users to freely modify creating parameters. Default: None.

    • landscape_size (int): Specify the image resolution of the generated loss landscape. For example, if it is set to 128, the resolution of the landscape is 128 * 128. The calculation time increases with the increase of resolution. Default: 40. Optional values: between 3 and 256.

    • create_landscape (dict): Select how to create loss landscape. Training process loss landscape(train) and training result loss landscape(result). Default: {“train”: True, “result”: True}. Optional: True/False.

    • num_samples (int): The size of the dataset used to create the loss landscape. For example, in image dataset, You can set num_samples is 2048, which means that 2048 images are used to create loss landscape. Default: 2048.

    • intervals (List[List[int]]): Specifies the interval in which the loss landscape. For example: If the user wants to create loss landscape of two training processes, they are 1-5 epoch and 6-10 epoch respectively. They can set [[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]. Note: Each interval have at least three epochs.

  • device_ids (List(int)) – Specifies which devices are used to create loss landscape. For example: [0, 1] refers to creating loss landscape with device 0 and device 1. Default: None.

  • output (str) – Specifies the path to save the loss landscape. Default: None. The default save path is the same as the summary file.

class tinyms.callbacks.History[源代码]

Records the network outputs and metrics information into a History object.

The network outputs information will be the loss value if not custimizing the train network or eval network; if the custimized network returns a Tensor or numpy.ndarray, the mean value of network output will be recorded, if the custimized network returns a tuple or list, the first element of network outputs will be recorded.

注解

Normally used in mindspore.train.Model.train or mindspore.train.Model.fit.

实际案例

>>> import numpy as np
>>> import mindspore.dataset as ds
>>> from mindspore import nn
>>> from mindspore.train import Model, History
>>> data = {"x": np.float32(np.random.rand(64, 10)), "y": np.random.randint(0, 5, (64,))}
>>> train_dataset = ds.NumpySlicesDataset(data=data).batch(32)
>>> net = nn.Dense(10, 5)
>>> crit = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
>>> opt = nn.Momentum(net.trainable_params(), 0.01, 0.9)
>>> history_cb = History()
>>> model = Model(network=net, optimizer=opt, loss_fn=crit, metrics={"recall"})
>>> model.train(2, train_dataset, callbacks=[history_cb])
>>> print(history_cb.epoch)
>>> print(history_cb.history)
{'epoch': [1, 2]}
{'net_output': [1.607877, 1.6033841]}
begin(run_context)[源代码]

Initialize the epoch property at the begin of training.

参数:

run_context (RunContext) – Context of the mindspore.train.Model.{train | eval}. For more details, please refer to mindspore.train.RunContext.

epoch_end(run_context)[源代码]

Records the first element of network outputs and metrics information at the end of epoch.

参数:

run_context (RunContext) – Context of the mindspore.train.Model.{train | eval}. For more details, please refer to mindspore.train.RunContext.

class tinyms.callbacks.LambdaCallback(on_train_epoch_begin=None, on_train_epoch_end=None, on_train_step_begin=None, on_train_step_end=None, on_train_begin=None, on_train_end=None, on_eval_epoch_begin=None, on_eval_epoch_end=None, on_eval_step_begin=None, on_eval_step_end=None, on_eval_begin=None, on_eval_end=None)[源代码]

Callback for creating simple, custom callbacks.

This callback is constructed with anonymous functions that will be called at the appropriate time (during mindspore.train.Model.{train | eval | fit}). Note that each stage of callbacks expects one positional arguments: run_context.

警告

This is an experimental API that is subject to change or deletion.

参数:
  • on_train_epoch_begin (Function) – called at each train epoch begin.

  • on_train_epoch_end (Function) – called at each train epoch end.

  • on_train_step_begin (Function) – called at each train step begin.

  • on_train_step_end (Function) – called at each train step end.

  • on_train_begin (Function) – called at the beginning of model train.

  • on_train_end (Function) – called at the end of model train.

  • on_eval_epoch_begin (Function) – called at eval epoch begin.

  • on_eval_epoch_end (Function) – called at eval epoch end.

  • on_eval_step_begin (Function) – called at each eval step begin.

  • on_eval_step_end (Function) – called at each eval step end.

  • on_eval_begin (Function) – called at the beginning of model eval.

  • on_eval_end (Function) – called at the end of model eval.

实际案例

>>> import numpy as np
>>> import mindspore.dataset as ds
>>> from mindspore import nn
>>> from mindspore.train import Model, LambdaCallback
>>> data = {"x": np.float32(np.random.rand(64, 10)), "y": np.random.randint(0, 5, (64,))}
>>> train_dataset = ds.NumpySlicesDataset(data=data).batch(32)
>>> net = nn.Dense(10, 5)
>>> crit = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
>>> opt = nn.Momentum(net.trainable_params(), 0.01, 0.9)
>>> lambda_callback = LambdaCallback(on_train_epoch_end=
... lambda run_context: print("loss: ", run_context.original_args().net_outputs))
>>> model = Model(network=net, optimizer=opt, loss_fn=crit, metrics={"recall"})
>>> model.train(2, train_dataset, callbacks=[lambda_callback])
loss: 1.6127687
loss: 1.6106578
class tinyms.callbacks.ReduceLROnPlateau(monitor='eval_loss', factor=0.1, patience=10, verbose=False, mode='auto', min_delta=0.0001, cooldown=0, min_lr=0)[源代码]

Reduce learning rate when the monitor has stopped improving.

Models often benefit from reducing the learning rate by a factor of 2-10 once learning stagnates. This callback monitors the training process and if no improvement is seen for a ‘patience’ number of epochs, the learning rate is reduced.

注解

Learning rate grouping is not supported now.

参数:
  • monitor (str) – quantity to be monitored. If evaluation is performed on the end of train epochs, the valid monitors can be “loss”, “eval_loss” or metric names passed when instantiate the Model; otherwise the valid monitor is “loss”. When monitor is “loss”, if train network has multiple outputs, the first element will be returned as training loss.

  • factor (float) – factor by which the learning rate will be reduced. new_lr = lr * factor. Default: 0.1.

  • patience (int) – monitor value is better than history best value over min_delta is seen as improvement, patience is number of epochs with no improvement that would be waited. When the waiting counter self.wait is larger than or equal to patience, the lr will be reduced. Default: 10.

  • verbose (bool) – If False: quiet, if True: print related information. Default: False.

  • mode (str) – one of {‘auto’, ‘min’, ‘max’}. In “min” mode, the learning rate will be reduced when the quantity monitored has stopped decreasing; in “max” mode it will be reduced when the quantity monitored has stopped increasing; in “auto” mode, the direction is automatically inferred from the name of the monitored quantity. Default: “auto”.

  • min_delta (float) – threshold for measuring the new optimum, to only focus on significant changes. Default: 1e-4.

  • cooldown (int) – number of epochs to wait before resuming normal operation after lr has been reduced. Default: 0.

  • min_lr (float) – lower bound on the learning rate. Default: 0.

引发:
  • ValueErrormode not in ‘auto’, ‘min’ or ‘max’.

  • ValueError – The monitor value is not a scalar.

  • ValueError – The learning rate is not a Parameter.

实际案例

注解

Before running the following example, you need to customize the network LeNet5 and dataset preparation function create_dataset. Refer to Building a Network and Dataset .

>>> from mindspore import nn
>>> from mindspore.train import Model, ReduceLROnPlateau
>>> net = LeNet5()
>>> loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
>>> optim = nn.Momentum(net.trainable_params(), 0.01, 0.9)
>>> model = Model(net, loss_fn=loss, optimizer=optim, metrics={"acc"})
>>> data_path = './MNIST_Data'
>>> dataset = create_dataset(data_path)
>>> cb = ReduceLROnPlateau(monitor="acc", patience=3, verbose=True)
>>> model.fit(10, dataset, callbacks=cb)
on_train_begin(run_context)[源代码]

Initialize variables at the begin of training.

参数:

run_context (RunContext) – Context information of the model. For more details, please refer to mindspore.train.RunContext.

on_train_epoch_end(run_context)[源代码]

monitors the training process and if no improvement is seen for a ‘patience’ number of epochs, the learning rate is reduced.

参数:

run_context (RunContext) – Context information of the model. For more details, please refer to mindspore.train.RunContext.

class tinyms.callbacks.EarlyStopping(monitor='eval_loss', min_delta=0, patience=0, verbose=False, mode='auto', baseline=None, restore_best_weights=False)[源代码]

Stop training when a monitored metric has stopped improving.

Assuming monitor is “accuracy”, with this, mode would be “max” since goal of trianing is to maximize the accuracy, the model.fit() training loop will check at end of epoch whether the accuracy is no longer increasing, considering the min_delta and patience if applicable. Once it’s found no longer increasing, run_context.request_stop() will be called and the training terminates.

参数:
  • monitor (str) – quantity to be monitored. If evaluation is performed on the end of train epochs, the valid monitors can be “loss”, “eval_loss” or metric names passed when instantiate the Model; otherwise the valid monitor is “loss”. When monitor is “loss”, if train network has multiple outputs, the first element will be returned as training loss. Default: “eval_loss”.

  • patience (int) – monitor value is better than history best value over min_delta is seen as improvement, patience is number of epochs with no improvement that would be waited. When the waiting counter self.wait is larger than or equal to patience, the training process will be stopped. Default: 0.

  • verbose (bool) – If False: quiet, if True: print related information. Default: True.

  • mode (str) – one of {‘auto’, ‘min’, ‘max’}. In “min” mode, the learning rate will be reduced when the quantity monitored has stopped decreasing; in “max” mode it will be reduced when the quantity monitored has stopped increasing; in “auto” mode, the direction is automatically inferred from the name of the monitored quantity. Default: “auto”.

  • min_delta (float) – threshold for measuring the new optimum, to only focus on significant changes. Default: 0.

  • baseline (float) – Baseline value for the monitor. When the monitor value shows improvement over the history best value and the baseline, the internal wait counter will be set to zero. Default: None.

  • restore_best_weights (bool) – Whether to restore model weights from the epoch with the best value of the monitored quantity. If False, the model weights obtained at the last step of training are used. Default: False.

引发:
  • ValueErrormode not in ‘auto’, ‘min’ or ‘max’.

  • ValueError – The monitor value is not a scalar.

实际案例

注解

Before running the following example, you need to customize the network LeNet5 and dataset preparation function create_dataset. Refer to Building a Network and Dataset .

>>> from mindspore import nn
>>> from mindspore.train import Model, EarlyStopping
>>> net = LeNet5()
>>> loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
>>> optim = nn.Momentum(net.trainable_params(), 0.01, 0.9)
>>> model = Model(net, loss_fn=loss, optimizer=optim, metrics={"acc"})
>>> data_path = './MNIST_Data'
>>> dataset = create_dataset(data_path)
>>> cb = EarlyStopping(monitor="acc", patience=3, verbose=True)
>>> model.fit(10, dataset, callbacks=cb)
on_train_begin(run_context)[源代码]

Initialize variables at the begin of training.

参数:

run_context (RunContext) – Context information of the model. For more details, please refer to mindspore.train.RunContext.

on_train_end(run_context)[源代码]

If verbose is True, print the stopped epoch.

参数:

run_context (RunContext) – Context information of the model. For more details, please refer to mindspore.train.RunContext.

on_train_epoch_end(run_context)[源代码]

monitors the training process and if no improvement is seen for a ‘patience’ number of epochs, the training process will be stopped.

参数:

run_context (RunContext) – Context information of the model. For more details, please refer to mindspore.train.RunContext.

class tinyms.callbacks.OnRequestExit(save_ckpt=True, save_mindir=True, file_name='Net', directory='./', sig=<Signals.SIGTERM: 15>)[源代码]

Respond to the user’s closing request, exit the training or eval process, and save the checkpoint and mindir.

Register OnRequestExit Callback before training, when the user want to exit the training process and save the training data, could send the registered exit signal ‘sig’ to the training process. After the training process executes the current step, saves the current training status, including checkpoint and mindir, and then exit the training process.

参数:
  • save_ckpt (bool) – Whether save the checkpoint before the training process exit. Default: True.

  • save_mindir (bool) – Whether save the mindir before the training process exit. Default: True.

  • file_name (str) – The saved checkpoint and mindir file name, the checkpoint file add suffix ‘.ckpt’, the mindir file add suffix ‘.mindir’. Default: ‘Net’.

  • directory (str) – The directory save checkpoint and mindir. Default: ‘./’.

  • sig (int) – The user registered exit signal, it must be a captureable and negligible signal. When the process receives the signal, exits the training or eval process. Default: signal.SIGTERM.

引发:
  • ValueError – If the ‘save_ckpt’ is not a bool.

  • ValueError – If the ‘save_mindir’ is not a bool.

  • ValueError – If the ‘file_name’ is not a str.

  • ValueError – If the ‘directory’ is not a str.

  • ValueError – If the ‘sig’ is not an int or the ‘sig’ is signal.SIGKILL.

实际案例

>>> import numpy as np
>>> import mindspore as ms
>>> from mindspore import dataset as ds
>>> from mindspore import nn
>>>
>>> # Define the forward net
>>> class ForwardNet(nn.Cell):
>>>     def __init__(self, num_class=10, channel=1):
>>>         super(ForwardNet, self).__init__()
>>>         self.param = ms.Parameter(1.0)
>>>         self.relu = ms.ops.ReLU()
>>>
>>>     def construct(self, x):
>>>         return self.relu(x + self.param)
>>> forward_net = ForwardNet()
>>> loss = nn.MAELoss()
>>> opt = nn.Momentum(forward_net.trainable_params(), 0.01, 0.9)
>>> model = ms.Model(forward_net, loss_fn=loss, optimizer=opt)        >>>
>>> # Create dataset
>>> def generator_multi_column():
>>>    i = 0
>>>    while i < 1000:
>>>        i += 1
>>>        yield np.ones((1, 32, 32)).astype(np.float32) * 0.01, np.array(1).astype(np.int32)
>>> dataset = ds.GeneratorDataset(source=generator_multi_column, column_names=["data", "label"])
>>> dataset = dataset.batch(32, drop_remainder=True)
>>>
>>> on_request_exit = ms.train.OnRequestExit(file_name='LeNet5')
>>> model.train(10, dataset, callbacks=on_request_exit)
>>> # The user send the signal SIGTERM to the training process,
>>> # the process would save the checkpoint and mindir, and then exit the training process.
on_eval_begin(run_context)[源代码]

When the eval begin, register the handler for exit signal transferred by user.

参数:

run_context (RunContext) – Context information of the model. For more details, please refer to mindspore.train.RunContext.

on_eval_end(run_context)[源代码]

When the eval end, if received the exit signal, the checkpoint and mindir would be saved according to the user config.

参数:

run_context (RunContext) – Include some information of the model. For more details, please refer to mindspore.train.RunContext.

on_eval_step_end(run_context)[源代码]

When the eval step end, if received the exit signal, set the ‘run_context’ attribute ‘_stop_requested’ to True. Then exit the eval process after this step eval.

参数:

run_context (RunContext) – Include some information of the model. For more details, please refer to mindspore.train.RunContext.

on_train_begin(run_context)[源代码]

When the train begin, register the handler for exit signal transferred by user.

参数:

run_context (RunContext) – Context information of the model. For more details, please refer to mindspore.train.RunContext.

on_train_end(run_context)[源代码]

When the train end, if received the exit signal, the checkpoint and mindir would be saved according to the user config.

参数:

run_context (RunContext) – Include some information of the model. For more details, please refer to mindspore.train.RunContext.

on_train_epoch_end(run_context)[源代码]

When the train epoch end, if received the exit signal, set the ‘run_context’ attribute ‘_stop_requested’ to True. Then exit the training process after this epoch training.

参数:

run_context (RunContext) – Include some information of the model. For more details, please refer to mindspore.train.RunContext.

on_train_step_end(run_context)[源代码]

When the train step end, if received the exit signal, set the ‘run_context’ attribute ‘_stop_requested’ to True. Then exit the training process after this step training.

参数:

run_context (RunContext) – Include some information of the model. For more details, please refer to mindspore.train.RunContext.

class tinyms.callbacks.BackupAndRestore(backup_dir, save_freq='epoch', delete_checkpoint=True)[源代码]

Callback to back up and restore the parameters during training.

注解

This function can only use in training.

参数:
  • backup_dir (str) – Path to store and load the checkpoint file.

  • save_freq (Union['epoch', int]) – When set to ‘epoch’ the callback saves the checkpoint at the end of each epoch. When set to an integer, the callback saves the checkpoint every save_freq epoch. Default: ‘epoch’.

  • delete_checkpoint (bool) – If delete_checkpoint=True, the checkpoint will be deleted after training is finished. Default: True.

引发:

实际案例

注解

Before running the following example, you need to customize the network LeNet5 and dataset preparation function create_dataset. Refer to Building a Network and Dataset .

>>> from mindspore import nn
>>> from mindspore.train import Model, BackupAndRestore
>>>
>>> net = LeNet5()
>>> loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
>>> optim = nn.Momentum(net.trainable_params(), 0.01, 0.9)
>>> model = Model(net, loss_fn=loss, optimizer=optim)
>>> data_path = './MNIST_Data'
>>> dataset = create_dataset(data_path)
>>> backup_ckpt = BackupAndRestore("backup")
>>> model.train(10, dataset, callbacks=backup_ckpt)
on_train_begin(run_context)[源代码]

Load the backup checkpoint file at the beginning of epoch.

参数:

run_context (RunContext) – Context of the process running. For more details, please refer to mindspore.train.RunContext.

on_train_end(run_context)[源代码]

Deleted checkpoint file at the end of train.

参数:

run_context (RunContext) – Context of the process running. For more details, please refer to mindspore.train.RunContext.

on_train_epoch_end(run_context)[源代码]

Backup checkpoint file at the end of train epoch.

参数:

run_context (RunContext) – Context of the process running. For more details, please refer to mindspore.train.RunContext.