tinyms.layers¶

Layer module contains pre-defined building blocks or computing units to construct neural networks.

The high-level components (Layers) used to construct the neural network.

class tinyms.layers.Layer(auto_prefix=True, flags=None)[source]¶

Base class for all neural networks.

A ‘Layer’ could be a single neural network layer, such as conv2d, relu, batch_norm, etc. or a composition of cells to constructing a network.

Note

In general, the autograd algorithm will automatically generate the implementation of the gradient function, but if back-propagation(bprop) method is implemented, the gradient function will be replaced by the bprop. The bprop implementation will receive a Tensor dout containing the gradient of the loss w.r.t. the output, and a Tensor out containing the forward result. The bprop needs to compute the gradient of the loss w.r.t. the inputs, gradient of the loss w.r.t. Parameter variables are not supported currently. The bprop method must contain the self parameter.

Parameters:: auto_prefix (bool) – Recursively generate namespaces. Default: True.

Examples

>>> from tinyms import layers, primitives as P
>>>
>>> class MyNet(layers.Layer):
...    def __init__(self):
...        super(MyNet, self).__init__()
...        self.relu = P.ReLU()
...
...    def construct(self, x):
...        return self.relu(x)

add_flags(**flags)¶

Add customized attributes for cell.

This method is also called when the cell class is instantiated and the class parameter ‘flags’ is set to True.

Parameters:: flags (dict) – Network configuration information, currently it is used for the binding of network and dataset. Users can also customize network attributes by this parameter. Default: None.

add_flags_recursive(**flags)¶

If a cell contains child cells, this method can recursively customize attributes of all cells.

Parameters:: flags (dict) – Network configuration information, currently it is used for the binding of network and dataset. Users can also customize network attributes by this parameter. Default: None.

apply(fn)¶

Applies fn recursively to every subcell (as returned by .cells()) as well as self. Typical use includes initializing the parameters of a model.

Parameters:: fn (function) – function to be applied to each subcell.
Returns:: Cell, self.

Examples

>>> import mindspore.nn as nn
>>> from mindspore.common.initializer import initializer, One
>>> net = nn.SequentialCell(nn.Dense(2, 2), nn.Dense(2, 2))
>>> def func(cell):
...     if isinstance(cell, nn.Dense):
...         cell.weight.set_data(initializer(One(), cell.weight.shape, cell.weight.dtype))
>>> net.apply(func)
SequentialCell<
  (0): Dense<input_channels=2, output_channels=2, has_bias=True>
  (1): Dense<input_channels=2, output_channels=2, has_bias=True>
  >
>>> print(net[0].weight.asnumpy())
[[1. 1.]
 [1. 1.]]

auto_cast_inputs(inputs)¶

Auto cast inputs in mixed precision scenarios.

Parameters:: inputs (tuple) – the inputs of construct.
Returns:: Tuple, the inputs after data type cast.

auto_parallel_compile_and_run()¶: Whether or not to execute compile and run in ‘AUTO_PARALLEL’ or ‘SEMI_AUTO_PARALLEL’ mode.

Note

This interface is deprecated.

property bprop_debug¶: Get whether cell custom bprop debug is enabled.

cast_inputs(inputs, dst_type)¶

Cast inputs to specified type.

Parameters:

inputs (tuple[Tensor]) – The cell inputs.
dst_type (mindspore.dtype) – The specified data type.

Returns:

tuple[Tensor], the result with destination data type.

cast_param(param)¶

Cast parameter according to auto mix precision level in pynative mode.

This interface is currently used in the case of auto mix precision and usually needs not to be used explicitly.

Parameters:: param (Parameter) – Parameters, the type of which should be cast.
Returns:: Parameter, the input parameter with type automatically cast.

cells()¶

Returns an iterator over immediate cells.

Returns:: Iteration, the immediate cells in the cell.

cells_and_names(cells=None, name_prefix='')¶

Returns an iterator over all cells in the network, including the cell’s name and itself.

Parameters:

cells (str) – Cells to iterate over. Default: None.
name_prefix (str) – Namespace. Default: ‘’.

Returns:

Iteration, all the child cells and corresponding names in the cell.

Examples

>>> from mindspore import nn
>>> class Net(nn.Cell):
...     def __init__(self):
...         super(Net, self).__init__()
...         self.conv = nn.Conv2d(3, 64, 3)
...     def construct(self, x):
...         out = self.conv(x)
...         return out
>>> names = []
>>> n = Net()
>>> for m in n.cells_and_names():
...     if m[0]:
...         names.append(m[0])

check_names()¶: Check the names of cell parameters.

compile(*args, **kwargs)¶

Compile Cell as a computation graph, the input must be consistent with the input defined in construct.

Parameters:

args (tuple) – Args of the Cell object.
kwargs (dict) – Kwargs of the Cell object.

compile_and_run(*args, **kwargs)¶

Compile and run Cell, the input must be consistent with the input defined in construct.

Note

It is not recommended to call directly.

Parameters:

args (tuple) – Args of the Cell object.
kwargs (dict) – Kwargs of the Cell object.

Returns:

Object, the result of executing.

construct(*args, **kwargs)¶

Defines the computation to be performed. This method must be overridden by all subclasses.

Note

It is not supported currently that inputs contain both tuple and non-tuple types at same time.

Parameters:

args (tuple) – Tuple of variable parameters.
kwargs (dict) – Dictionary of variable keyword parameters.

Returns:

Tensor, returns the computed result.

exec_checkpoint_graph()¶: Executes saving checkpoint graph operation.

extend_repr()¶

Expand the description of Cell.

To print customized extended information, re-implement this method in your own cells.

flatten_weights(fusion_size=0)¶

Reset data for weight parameters so that they are using contiguous memory chunks grouped by data type.

Note

By default, parameters with same data type will using a single contiguous memory chunk. but for some models with huge number of parameters, splitting a large memory chunk into several smaller memory chunks has the potential for performance gains, if this is the case, we can use ‘fusion_size’ to limit the maximum memory chunk size.

Parameters:: fusion_size (int) – Maximum memory chunk size in bytes, 0 for unlimited. Default: 0.

generate_scope()¶: Generate the scope for each cell object in the network.

get_flags()¶: Get the self_defined attributes of the cell, which can be added by add_flags method.

get_func_graph_proto()¶: Return graph binary proto.

get_inputs()¶

Returns the dynamic_inputs of a cell object in one network.

Returns:: inputs (tuple), Inputs of the Cell object.

Warning

This is an experimental API that is subject to change or deletion.

get_mixed_precision_type(self: mindspore._c_expression.Cell_) → mindspore._c_expression.MixedPrecisionType¶: Get mixed precision type.

get_parameters(expand=True)¶

Returns an iterator over cell parameters.

Yields parameters of this cell. If expand is true, yield parameters of this cell and all subcells.

Parameters:: expand (bool) – If true, yields parameters of this cell and all subcells. Otherwise, only yield parameters that are direct members of this cell. Default: True.
Returns:: Iteration, all parameters at the cell.

Examples

>>> from mindspore import nn
>>> net = nn.Dense(3, 4)
>>> parameters = []
>>> for item in net.get_parameters():
...     parameters.append(item)

get_scope()¶

Returns the scope of a cell object in one network.

Returns:: String, scope of the cell.

infer_param_pipeline_stage()¶

Infer pipeline stages of all parameters in the cell.

Note

If a parameter does not belong to any cell which has been set pipeline_stage, the parameter should use add_pipeline_stage to add it’s pipeline_stage information.
If a parameter P has been used by two operators in different stages “stageA” and “stageB”, the parameter P should use P.add_pipeline_stage(stageA) and P.add_pipeline_stage(stageB) to add it’s stage information before using infer_param_pipeline_stage.

Returns:: The params belong to current stage in pipeline parallel.
Raises:: RuntimeError – If there is a parameter does not belong to any stage.

init_parameters_data(auto_parallel_mode=False)¶

Initialize all parameters and replace the original saved parameters in cell.

Note

trainable_params() and other similar interfaces may return different parameter instance after init_parameters_data, do not save these results.

Parameters:: auto_parallel_mode (bool) – If running in auto_parallel_mode. Default: False.
Returns:: Dict[Parameter, Parameter], returns a dict of original parameter and replaced parameter.

insert_child_to_cell(child_name, child_cell)¶

Adds a child cell to the current cell with a given name.

Parameters:

child_name (str) – Name of the child cell.
child_cell (Cell) – The child cell to be inserted.

Raises:

KeyError – Child Cell’s name is incorrect or duplicated with the other child name.
TypeError – If type of child_name is not str.
TypeError – Child Cell’s type is incorrect.

insert_param_to_cell(param_name, param, check_name_contain_dot=True)¶

Adds a parameter to the current cell.

Inserts a parameter with given name to the cell. The method is currently used in mindspore.nn.Cell.__setattr__.

Parameters:

param_name (str) – Name of the parameter.
param (Parameter) – Parameter to be inserted to the cell.
check_name_contain_dot (bool) – Determines whether the name input is compatible. Default: True.

Raises:

KeyError – If the name of parameter is null or contains dot.
TypeError – If the type of parameter is not Parameter.

load_parameter_slice(params)¶: Replace parameters with sliced tensors by parallel strategies.

Note

This interface is deprecated.

name_cells()¶

Returns an iterator over all immediate cells in the network.

Include name of the cell and cell itself.

Returns:: Dict, all the child cells and corresponding names in the cell.

property param_prefix¶: Param prefix is the prefix of current cell’s direct child parameter.

property parameter_layout_dict¶: parameter_layout_dict represents the tensor layout of a parameter, which is inferred by shard strategy and distributed operator information.

parameters_and_names(name_prefix='', expand=True)¶

Returns an iterator over cell parameters.

Includes the parameter’s name and itself.

Parameters:

name_prefix (str) – Namespace. Default: ‘’.
expand (bool) – If true, yields parameters of this cell and all subcells. Otherwise, only yield parameters that are direct members of this cell. Default: True.

Returns:

Iteration, all the names and corresponding parameters in the cell.

Examples

>>> from mindspore import nn
>>> n = nn.Dense(3, 4)
>>> names = []
>>> for m in n.parameters_and_names():
...     if m[0]:
...         names.append(m[0])

parameters_broadcast_dict(recurse=True)¶

Gets the parameters broadcast dictionary of this cell.

Parameters:: recurse (bool) – Whether contains the parameters of subcells. Default: True.
Returns:: OrderedDict, return parameters broadcast dictionary.

parameters_dict(recurse=True)¶

Gets the parameters dictionary of this cell.

Parameters:: recurse (bool) – Whether contains the parameters of subcells. Default: True.
Returns:: OrderedDict, return parameters dictionary.

place(role, rank_id)¶

Set the label for all operators in this cell. This label tells MindSpore compiler on which process this cell should be launched. And each process’s identical label consists of input role and rank_id. So by setting different cells with different labels, which will be launched on different processes, users can launch a distributed training or predicting job.

Note

This method is effective only after mindspore.communication.init() is called for dynamic cluster building.

Parameters:

role (str) – The role of the process on which this cell will be launched. Only ‘MS_WORKER’ is supported for now.
rank_id (int) – The rank id of the process on which this cell will be launched. The rank is unique in processes with the same role.

Examples

>>> from mindspore import context
>>> import mindspore.nn as nn
>>> context.set_context(mode=context.GRAPH_MODE)
>>> fc = nn.Dense(2, 3)
>>> fc.place('MS_WORKER', 0)

recompute(**kwargs)¶

Set the cell recomputed. All the primitive in the cell except the outputs will be set recomputed. If a primitive set recomputed feeds into some backward nodes for computing gradient, rather than storing the intermediate activation computed in forward pass, we will recompute it in backward pass.

Note

If the computation involves something like randomization or global variable, the equivalence is not guaranteed currently.
If the recompute api of a primitive in this cell is also called, the recompute mode of this primitive is subject to the recompute api of the primitive.
The interface can be configured only once. Therefore, when the parent cell is configured, the child cell should not be configured.
The outputs of cell are excluded from recomputation by default, which is based on our configuration experience to reduce memory footprint. If a cell has only one primitive and the primitive is wanted to be set recomputed, use the recompute api of the primtive.
When the memory remains after applying the recomputation, configuring ‘mp_comm_recompute=False’ to improve performance if necessary.
When the memory still not enough after applying the recompute, configuring ‘parallel_optimizer_comm_recompute=True’ to save more memory if necessary. Cells in the same fusion group should have the same parallel_optimizer_comm_recompute configures.

Parameters:

mp_comm_recompute (bool) – Specifies whether the model parallel communication operators in the cell are recomputed in auto parallel or semi auto parallel mode. Default: True.
parallel_optimizer_comm_recompute (bool) – Specifies whether the communication operator allgathers introduced by optimizer shard are recomputed in auto parallel or semi auto parallel mode. Default: False.

register_backward_hook(hook_fn)¶

Register the backward hook function.

Note

The register_backward_hook(hook_fn) does not work in graph mode or functions decorated with ‘jit’.
The ‘hook_fn’ must be defined as the following code. cell_id is the information of registered Cell object, including name and ID. grad_input is the gradient passed to the Cell. grad_output is the gradient computed and passed to the next Cell or primitive, which may be modified by returning a new output gradient.
The ‘hook_fn’ should have the following signature: hook_fn(cell_id, grad_input, grad_output) -> New output gradient or none.
The ‘hook_fn’ is executed in the python environment. In order to prevent running failed when switching to graph mode, it is not recommended to write it in the construct function of Cell object. In the pynative mode, if the register_backward_hook function is called in the construct function of the Cell object, a hook function will be added at each run time of Cell object.

Parameters:: hook_fn (function) – Python function. Backward hook function.
Returns:: Handle, it is an instance of mindspore.common.hook_handle.HookHandle and corresponding to the hook_fn . The handle can be used to remove the added hook_fn by calling handle.remove() .
Raises:: TypeError – If the hook_fn is not a function of python.

Supported Platforms: Ascend GPU CPU

Examples

>>> import numpy as np
>>> import mindspore as ms
>>> import mindspore.nn as nn
>>> from mindspore import Tensor
>>> from mindspore.ops import GradOperation
>>> ms.set_context(mode=ms.PYNATIVE_MODE)
>>> def backward_hook_fn(cell_id, grad_input, grad_output):
...     print("backward input: ", grad_input)
...     print("backward output: ", grad_output)
...
>>> class Net(nn.Cell):
...     def __init__(self):
...         super(Net, self).__init__()
...         self.relu = nn.ReLU()
...         self.handle = self.relu.register_backward_hook(backward_hook_fn)
...
...     def construct(self, x):
...         x = x + x
...         x = self.relu(x)
...         return x
>>> grad = GradOperation(get_all=True)
>>> net = Net()
>>> output = grad(net)(Tensor(np.ones([1]).astype(np.float32)))
backward input: (Tensor(shape=[1], dtype=Float32, value= [ 1.00000000e+00]),)
backward output: (Tensor(shape=[1], dtype=Float32, value= [ 1.00000000e+00]),)
>>> print(output)
(Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00]),)

register_forward_hook(hook_fn)¶

Set the Cell forward hook function.

Note

The register_forward_hook(hook_fn) does not work in graph mode or functions decorated with ‘jit’.
‘hook_fn’ must be defined as the following code. cell_id is the information of registered Cell object, including name and ID. inputs is the forward input objects passed to the Cell. output is the forward output object of the Cell. The ‘hook_fn’ can modify the forward output object by returning new forward output object.
It should have the following signature: hook_fn(cell_id, inputs, output) -> new output object or none.
In order to prevent running failed when switching to graph mode, it is not recommended to write it in the construct function of Cell object. In the pynative mode, if the register_forward_hook function is called in the construct function of the Cell object, a hook function will be added at each run time of Cell object.

Parameters:: hook_fn (function) – Python function. Forward hook function.
Returns:: Handle, it is an instance of mindspore.common.hook_handle.HookHandle and corresponding to the hook_fn . The handle can be used to remove the added hook_fn by calling handle.remove() .
Raises:: TypeError – If the hook_fn is not a function of python.

Supported Platforms: Ascend GPU CPU

Examples

>>> import numpy as np
>>> import mindspore as ms
>>> import mindspore.nn as nn
>>> from mindspore import Tensor
>>> from mindspore.ops import GradOperation
>>> ms.set_context(mode=ms.PYNATIVE_MODE)
>>> def forward_hook_fn(cell_id, inputs, output):
...     print("forward inputs: ", inputs)
...     print("forward output: ", output)
...
>>> class Net(nn.Cell):
...     def __init__(self):
...         super(Net, self).__init__()
...         self.mul = nn.MatMul()
...         self.handle = self.mul.register_forward_hook(forward_hook_fn)
...
...     def construct(self, x, y):
...         x = x + x
...         x = self.mul(x, y)
...         return x
>>> grad = GradOperation(get_all=True)
>>> net = Net()
>>> output = grad(net)(Tensor(np.ones([1]).astype(np.float32)), Tensor(np.ones([1]).astype(np.float32)))
forward inputs: (Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00]), Tensor(shape=[1],
                dtype=Float32, value= [ 1.00000000e+00]))
forward output: 2.0
>>> print(output)
(Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00]), Tensor(shape=[1], dtype=Float32,
value= [ 2.00000000e+00]))

register_forward_pre_hook(hook_fn)¶

Register forward pre hook function for Cell object.

Note

The register_forward_pre_hook(hook_fn) does not work in graph mode or functions decorated with ‘jit’.
‘hook_fn’ must be defined as the following code. cell_id is the information of registered Cell object, including name and ID. inputs is the forward input objects passed to the Cell. The ‘hook_fn’ can modify the forward input objects by returning new forward input objects.
It should have the following signature: hook_fn(cell_id, inputs) -> new input objects or none.
In order to prevent running failed when switching to graph mode, it is not recommended to write it in the construct function of Cell object. In the pynative mode, if the register_forward_pre_hook function is called in the construct function of the Cell object, a hook function will be added at each run time of Cell object.

Parameters:: hook_fn (function) – Python function. Forward pre hook function.
Returns:: Handle, it is an instance of mindspore.common.hook_handle.HookHandle and corresponding to the hook_fn . The handle can be used to remove the added hook_fn by calling handle.remove() .
Raises:: TypeError – If the hook_fn is not a function of python.

Supported Platforms: Ascend GPU CPU

Examples

>>> import numpy as np
>>> import mindspore as ms
>>> import mindspore.nn as nn
>>> from mindspore import Tensor
>>> from mindspore.ops import GradOperation
>>> ms.set_context(mode=ms.PYNATIVE_MODE)
>>> def forward_pre_hook_fn(cell_id, inputs):
...     print("forward inputs: ", inputs)
...
>>> class Net(nn.Cell):
...     def __init__(self):
...         super(Net, self).__init__()
...         self.mul = nn.MatMul()
...         self.handle = self.mul.register_forward_pre_hook(forward_pre_hook_fn)
...
...     def construct(self, x, y):
...         x = x + x
...         x = self.mul(x, y)
...         return x
>>> grad = GradOperation(get_all=True)
>>> net = Net()
>>> output = grad(net)(Tensor(np.ones([1]).astype(np.float32)), Tensor(np.ones([1]).astype(np.float32)))
forward inputs: (Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00]), Tensor(shape=[1],
                dtype=Float32, value= [ 1.00000000e+00]))
>>> print(output)
(Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00]), Tensor(shape=[1], dtype=Float32,
value= [ 2.00000000e+00]))

remove_redundant_parameters()¶

Remove the redundant parameters.

This interface usually needs not to be used explicitly.

run_construct(cast_inputs, kwargs)¶

Run the construct function.

Note

This function will be removed in a future version. It is not recommended to call this function.

Parameters:

cast_inputs (tuple) – The input objects of Cell.
kwargs (dict) – Provide keyword arguments.

Returns:

output, the output object of Cell.

set_auto_parallel()¶: Set the cell to auto parallel mode.

Note

This interface is deprecated.

set_boost(boost_type)¶

In order to improve the network performance, configure the network auto enable to accelerate the algorithm in the algorithm library.

If boost_type is not in the algorithm library, please view the algorithm in the algorithm library through algorithm library.

Note

Some acceleration algorithms may affect the accuracy of the network, please choose carefully.

Parameters:: boost_type (str) – accelerate algorithm.
Returns:: Cell, the cell itself.
Raises:: ValueError – If boost_type is not in the algorithm library.

set_broadcast_flag(mode=True)¶

Set parameter broadcast mode for this cell.

Parameters:: mode (bool) – Specifies whether the mode is parameter broadcast. Default: True.

set_comm_fusion(fusion_type, recurse=True)¶

Set comm_fusion for all the parameters in this cell. Please refer to the description of mindspore.Parameter.comm_fusion.

Note

The value of attribute will be overwritten when the function is called multiply.

Parameters:

fusion_type (int) – The value of comm_fusion.
recurse (bool) – Whether sets the trainable parameters of subcells. Default: True.

set_data_parallel()¶

For all primitive ops in this cell(including ops of cells that wrapped by this cell), if parallel strategy is not specified, then instead of auto-searching, data parallel strategy will be generated for those primitive ops.

Note

Only effective while using auto_parallel_context = ParallelMode.AUTO_PARALLEL under graph mode.

Examples

>>> import mindspore.nn as nn
>>> net = nn.Dense(3, 4)
>>> net.set_data_parallel()

set_grad(requires_grad=True)¶

Sets the cell flag for gradient. In pynative mode, this parameter specifies whether the network requires gradients. If true, the backward network needed to compute the gradients will be generated when the forward network is executed.

Parameters:: requires_grad (bool) – Specifies if the net need to grad, if it is true, the cell will construct backward network in pynative mode. Default: True.
Returns:: Cell, the cell itself.

set_inputs(*inputs)¶

Save set inputs for computation graph. The number of inputs should be the same with that of the datasets. When using Model for dynamic shape, please make sure that all networks and loss functions passed to the Model are configured with set_inputs. The inputs can be Tensor of either dynamic or static shape.

Parameters:: inputs (tuple) – Inputs of the Cell object.

Warning

This is an experimental API that is subject to change or deletion.

Examples

>>> import numpy as np
>>> import mindspore as ms
>>> from mindspore import nn, Tensor, context
>>>
>>> class reluNet(nn.Cell):
...     def __init__(self):
...         super(reluNet, self).__init__()
...         self.relu = nn.ReLU()
...     def construct(self, x):
...         return self.relu(x)
>>>
>>> net = reluNet()
>>> input_dyn = Tensor(shape=[3, None], dtype=ms.float32)
>>> net.set_inputs(input_dyn)
>>> input1 = Tensor(np.random.random([3, 10]), dtype=ms.float32)
>>> output = net(input1)

set_jit_config(jit_config)¶

Set jit config for cell.

Parameters:: jit_config (JitConfig) – Jit config for compile. For details, please refer to mindspore.JitConfig.

set_mixed_precision_type(self: mindspore._c_expression.Cell_, arg0: mindspore._c_expression.MixedPrecisionType) → None¶: Set mixed precision type.

set_parallel_input_with_inputs(*inputs)¶: Slice inputs tensors by parallel strategies.

Note

This interface is deprecated.

set_param_fl(push_to_server=False, pull_from_server=False, requires_aggr=True)¶

Set the way of parameter and server interaction.

Parameters:

push_to_server (bool) – Whether the parameter should be pushed to server. Default: False.
pull_from_server (bool) – Whether the parameter should be pulled from server. Default: False.
requires_aggr (bool) – Whether the parameter should be aggregated in the server. Default: True.

set_param_ps(recurse=True, init_in_server=False)¶

Set whether the trainable parameters are updated by parameter server and whether the trainable parameters are initialized on server.

Note

It only works when a running task is in the parameter server mode. It is only supported in graph mode.

Parameters:

recurse (bool) – Whether sets the trainable parameters of subcells. Default: True.
init_in_server (bool) – Whether trainable parameters updated by parameter server are initialized on server. Default: False.

set_train(mode=True)¶

Sets the cell to training mode.

The cell itself and all children cells will be set to training mode. Layers that have different constructions for training and predicting, such as BatchNorm, will distinguish between the branches by this attribute. If set to true, the training branch will be executed, otherwise another branch.

Note

When execute function Model.train(), framework will call Cell.set_train(True). When execute function Model.eval(), framework will call Cell.set_train(False).

Parameters:: mode (bool) – Specifies whether the model is training. Default: True.
Returns:: Cell, the cell itself.

shard(in_strategy, out_strategy=None, parameter_plan=None, device='Ascend', level=0)¶

Defining the input and output layouts of this cell and the parallel strategies of remaining ops will be generated by sharding propagation. In PyNative mode, use this method to specify a Cell for distributed execution in graph mode. in_strategy and out_strategy define the input and output layout respectively. in_strategy/out_strategy should be a tuple, each element of which corresponds to the desired layout of this input/output, and None represents data_parallel, which can refer to the description of mindspore.ops.Primitive.shard. The parallel strategies of remaining operators are derived from the strategy specified by the input and output.

Note

Only effective in PYNATIVE_MODE and in either ParallelMode.AUTO_PARALLEL with search_mode in auto_parallel_context set as sharding_propagation. If the input contain Parameter, its strategy should be set in in_strategy.

Parameters:

in_strategy (tuple) – Define the layout of inputs, each element of the tuple should be a tuple or None. Tuple defines the layout of the corresponding input and None represents a data parallel strategy.
out_strategy (Union[None, tuple]) – Define the layout of outputs similar with in_strategy. It is not in use right now. Default: None.
parameter_plan (Union[dict, None]) – Define the layout for the specified parameters. Each element in dict defines the layout of the parameter like “param_name: layout”. The key is a parameter name of type ‘str’. The value is a 1-D integer tuple, indicating the corresponding layout. If the parameter name is incorrect or the corresponding parameter has been set, the parameter setting will be ignored. Default: None.
device (string) – Select a certain device target. It is not in use right now. Support [“CPU”, “GPU”, “Ascend”]. Default: “Ascend”.
level (int) – Option for parallel strategy infer algorithm, namely the object function, maximize computation over communication ratio, maximize speed performance, minimize memory usage etc. It is not in use right now. Support [“0”, “1”, “2”]. Default: “0”.

Returns:

Cell, the cell itself.

Examples

>>> import mindspore.nn as nn
>>>
>>> class Block(nn.Cell):
...   def __init__(self):
...     self.dense1 = nn.Dense(10, 10)
...     self.relu = nn.ReLU()
...     self.dense2 = nn.Dense2(10, 10)
...   def construct(self, x):
...     x = self.relu(self.dense2(self.relu(self.dense1(x))))
...     return x
>>>
>>> class example(nn.Cell):
...   def __init__(self):
...     self.block1 = Block()
...     self.block2 = Block()
...     self.block2.shard(in_strategy=((2, 1),), out_strategy=(None,),
...                       parameter_plan={'self.block2.shard.dense1.weight': (4, 1)})
...   def construct(self, x):
...     x = self.block1(x)
...     x = self.block2(x)
...     return x

to_float(dst_type)¶

Add cast on all inputs of cell and child cells to run with certain float type.

If dst_type is mindspore.dtype.float16, all the inputs of Cell, including input, Parameter and Tensor, will be cast to float16. Please refer to the usage in source code of mindspore.amp.build_train_network().

Note

Multiple calls will overwrite.

Parameters:: dst_type (mindspore.dtype) – Transfer cell to run with dst_type. dst_type can be mstype.float16 or mstype.float32.
Returns:: Cell, the cell itself.
Raises:: ValueError – If dst_type is not mstype.float32 or mstype.float16.

Supported Platforms:: Ascend GPU CPU

Examples

>>> import mindspore.nn as nn
>>> from mindspore import dtype as mstype
>>>
>>> net = nn.Conv2d(120, 240, 4, has_bias=False, weight_init='normal')
>>> net.to_float(mstype.float16)
Conv2d<input_channels=120, output_channels=240, kernel_size=(4, 4), stride=(1, 1), pad_mode=same,
padding=0, dilation=(1, 1), group=1, has_bias=False, weight_init=normal, bias_init=zeros, format=NCHW>

trainable_params(recurse=True)¶

Returns all trainable parameters.

Returns a list of all trainable parameters.

Parameters:: recurse (bool) – Whether contains the trainable parameters of subcells. Default: True.
Returns:: List, the list of trainable parameters.

untrainable_params(recurse=True)¶

Returns all untrainable parameters.

Returns a list of all untrainable parameters.

Parameters:: recurse (bool) – Whether contains the untrainable parameters of subcells. Default: True.
Returns:: List, the list of untrainable parameters.

update_cell_prefix()¶

Update the param_prefix of all child cells.

After being invoked, it can get all the cell’s children’s name prefix by ‘_param_prefix’.

update_cell_type(cell_type)¶

The current cell type is updated when a quantization aware training network is encountered.

After being invoked, it can set the cell type to ‘cell_type’.

Parameters:: cell_type (str) – The type of cell to be updated, cell_type can be “quant” or “second-order”.

update_parameters_name(prefix='', recurse=True)¶

Adds the prefix string to the names of parameters.

Parameters:

prefix (str) – The prefix string. Default: ‘’.
recurse (bool) – Whether contains the parameters of subcells. Default: True.

Layer module contains pre-defined building blocks or computing units to construct neural networks.

The high-level components (Layers) used to construct the neural network.

class tinyms.layers.SequentialLayer(*args)[source]¶

Sequential layer container.

A list of Layers will be added to it in the order they are passed in the constructor. Alternatively, an ordered dict of cells can also be passed in.

Parameters:: args (Union[list, OrderedDict]) – List of subclass of Layer.
Raises:: TypeError – If the type of the argument is not list or OrderedDict.

Inputs:

input (Tensor) - Tensor with shape according to the first Cell in the sequence.

Outputs:

Tensor, the output Tensor with shape depending on the input and defined sequence of Layers.

Examples

>>> import tinyms as ts
>>> from tinyms.layers import SequentialLayer, Conv2d, ReLU
>>>
>>> seq_layer = SequentialLayer([Conv2d(3, 2, 3, pad_mode='valid', weight_init="ones"), ReLU()])
>>> x = ts.ones([1, 3, 4, 4])
>>> print(seq_layer(x))
[[[[27. 27.]
   [27. 27.]]
  [[27. 27.]
   [27. 27.]]]]

class tinyms.layers.LayerList(*args, **kwargs)[source]¶

Holds Layers in a list.

LayerList can be used like a regular Python list, support ‘__getitem__’, ‘__setitem__’, ‘__delitem__’, ‘__len__’, ‘__iter__’ and ‘__iadd__’, but layers it contains are properly registered, and will be visible by all Layer methods.

Parameters:: args (list, optional) – List of subclass of Layer.

Examples

>>> from tinyms.layers import LayerList, Conv2d, BatchNorm2d, ReLU
>>>
>>> conv = nn.Conv2d(100, 20, 3)
>>> layers = LayerList([BatchNorm2d(20)])
>>> layers.insert(0, Conv2d(100, 20, 3))
>>> layers.append(ReLU())
>>> layers
LayerList<
  (0): Conv2d<input_channels=100, ..., bias_init=None>
  (1): BatchNorm2d<num_features=20, ..., moving_variance=Parameter (name=variance)>
  (2): ReLU<>
  >

class tinyms.layers.TimeDistributed(layer, time_axis, reshape_with_axis=None)[source]¶

The time distributed layer.

Time distributed is a wrapper which allows to apply a layer to every temporal slice of an input. And the x should be at least 3D. There are two cases in the implementation. When reshape_with_axis provided, the reshape method will be chosen, which is more efficient; otherwise, the method of dividing the inputs along time axis will be used, which is more general. For example, reshape_with_axis could not be provided when deal with Batch Normalization.

Parameters:

layer (Union[Cell, Primitive]) – The Cell or Primitive which will be wrapped.
time_axis (int) – The axis of time_step.
reshape_with_axis (int) – The axis which will be reshaped with time_axis. Default: None.

Inputs:

x (Tensor) - Tensor of shape \((N, T, *)\), where \(*\) means any number of additional dimensions.

Outputs:

Tensor of shape \((N, T, *)\)

Raises:: TypeError – If layer is not a Cell or Primitive.

Supported Platforms:: Ascend GPU CPU

Examples

>>> x = Tensor(np.random.random([32, 10, 3]), mindspore.float32)
>>> dense = nn.Dense(3, 6)
>>> net = nn.TimeDistributed(dense, time_axis=1, reshape_with_axis=0)
>>> output = net(x)
>>> print(output.shape)
(32, 10, 6)

class tinyms.layers.ForwardValueAndGrad(network, weights=None, get_all=False, get_by_list=False, sens_param=False)[source]¶

Encapsulate training network.

Including the network and a gradient function. The resulting Cell is trained with input ‘*inputs’. The backward graph will be created in the gradient function to calculating gradient.

Parameters:

network (Cell) – The training network.
weights (ParameterTuple) – The parameters of the training network that need to calculate the gradient. Default: None.
get_all (bool) – If True, get all the gradients with respect to inputs. Default: False.
get_by_list (bool) – If True, get all the gradients with respect to Parameter variables. If get_all and get_by_list are both False, get the gradient with respect to first input. If get_all and get_by_list are both True, get the gradients with respect to inputs and Parameter variables at the same time in the form of ((gradients with respect to inputs), (gradients with respect to parameters)). Default: False.
sens_param (bool) – Whether to append sensitivity (gradient with respect to output) as input. If sens_param is False, a ‘ones_like(outputs)’ sensitivity will be attached automatically. Default: False. If the sens_param is True, a sensitivity (gradient with respect to output) needs to be transferred through the input parameter.

Inputs:

*inputs (Tuple(Tensor…)) - Tuple of inputs with shape \((N, \ldots)\).
sens - A sensitivity (gradient with respect to output) as the input of backpropagation. If network has single output, the sens is a tensor. If network has multiple outputs, the sens is the tuple(tensor).

Outputs:

forward value - The result of network forward running.
gradients (tuple(tensor)) - The gradients of network parameters and inputs.

Supported Platforms:

Ascend GPU CPU

Examples

>>> import numpy as np
>>> from mindspore import Tensor, nn, common, ops, ParameterTuple, Parameter
>>>
>>> class Net(nn.Cell):
...    def __init__(self):
...        super(Net, self).__init__()
...        self.weight = Parameter(Tensor(np.ones([2, 2]).astype(np.float32)), name="weight")
...        self.matmul = ops.MatMul()
...
...    def construct(self, x):
...        out = self.matmul(x, self.weight)
...        return out
...
>>> net = Net()
>>> criterion = nn.SoftmaxCrossEntropyWithLogits()
>>> net_with_criterion = nn.WithLossCell(net, criterion)
>>> weight = ParameterTuple(net.trainable_params())
>>> train_network = nn.ForwardValueAndGrad(net_with_criterion, weights=weight, get_all=True, get_by_list=True)
>>> inputs = Tensor(np.ones([1, 2]).astype(np.float32))
>>> labels = Tensor(np.ones([1, 2]).astype(np.float32))
>>> result = train_network(inputs, labels)
>>> print(result)
 (Tensor(shape=[1], dtype=Float32, value= [ 1.38629436e+00]), ((Tensor(shape=[1, 2], dtype=Float32, value=
[[ -1.00000000e+00,  -1.00000000e+00]]), Tensor(shape=[1, 2], dtype=Float32, value=
[[ 0.00000000e+00,  0.00000000e+00]])), (Tensor(shape=[2, 2], dtype=Float32, value=
[[ -5.00000000e-01,  -5.00000000e-01],
 [ -5.00000000e-01,  -5.00000000e-01]]),)))

class tinyms.layers.TrainOneStepCell(network, optimizer, sens=1.0)[source]¶

Network training package class.

Wraps the network with the optimizer. The resulting Cell is trained with input ‘*inputs’. The backward graph will be created in the construct function to update the parameter. Different parallel modes are available for training.

Parameters:

network (Cell) – The training network. The network only supports single output.
optimizer (Union[Cell]) – Optimizer for updating the network parameters.
sens (numbers.Number) – The scaling number to be filled as the input of backpropagation. Default value is 1.0.

Inputs:

*inputs (Tuple(Tensor)) - Tuple of input tensors with shape \((N, \ldots)\).

Outputs:

Tensor, a tensor means the loss value, the shape of which is usually \(()\).

Raises:: TypeError – If sens is not a numbers.Number.

Supported Platforms:: Ascend GPU CPU

Examples

>>> net = Net()
>>> loss_fn = nn.SoftmaxCrossEntropyWithLogits()
>>> optim = nn.Momentum(net.trainable_params(), learning_rate=0.1, momentum=0.9)
>>> #1) Using the WithLossCell provided by MindSpore
>>> loss_net = nn.WithLossCell(net, loss_fn)
>>> train_net = nn.TrainOneStepCell(loss_net, optim)
>>>
>>> #2) Using user-defined WithLossCell
>>> class MyWithLossCell(Cell):
...    def __init__(self, backbone, loss_fn):
...        super(MyWithLossCell, self).__init__(auto_prefix=False)
...        self._backbone = backbone
...        self._loss_fn = loss_fn
...
...    def construct(self, x, y, label):
...        out = self._backbone(x, y)
...        return self._loss_fn(out, label)
...
...    @property
...    def backbone_network(self):
...        return self._backbone
...
>>> loss_net = MyWithLossCell(net, loss_fn)
>>> train_net = nn.TrainOneStepCell(loss_net, optim)

class tinyms.layers.WithLossCell(backbone, loss_fn)[source]¶

Cell with loss function.

Wraps the network with loss function. This Cell accepts data and label as inputs and the computed loss will be returned.

Parameters:

backbone (Cell) – The backbone network to wrap.
loss_fn (Cell) – The loss function used to compute loss.

Inputs:

data (Tensor) - Tensor of shape \((N, \ldots)\).
label (Tensor) - Tensor of shape \((N, \ldots)\).

Outputs:

Tensor, a tensor means the loss value, the shape of which is usually \(()\).

Raises:: TypeError – If dtype of data or label is neither float16 nor float32.

Supported Platforms:: Ascend GPU CPU

Examples

>>> net = Net()
>>> loss_fn = nn.SoftmaxCrossEntropyWithLogits(sparse=False)
>>> net_with_criterion = nn.WithLossCell(net, loss_fn)
>>>
>>> batch_size = 2
>>> data = Tensor(np.ones([batch_size, 1, 32, 32]).astype(np.float32) * 0.01)
>>> label = Tensor(np.ones([batch_size, 10]).astype(np.float32))
>>>
>>> output_data = net_with_criterion(data, label)

property backbone_network¶

Get the backbone network.

Returns:: Cell, the backbone network.

class tinyms.layers.WithGradCell(network, loss_fn=None, sens=None)[source]¶

Cell that returns the gradients.

Wraps the network with backward cell to compute gradients. A network with a loss function is necessary as argument. If loss function in None, the network must be a wrapper of network and loss function. This Cell accepts ‘*inputs’ as inputs and returns gradients for each trainable parameter.

Note

Run in PyNative mode.

Parameters:

network (Cell) – The target network to wrap. The network only supports single output.
loss_fn (Cell) – Primitive loss function used to compute gradients. Default: None.
sens (Union[None, Tensor, Scalar, Tuple ...]) – The sensitive for backpropagation, the type and shape must be same as the network output. If None, we will fill one to a same type shape of output value. Default: None.

Inputs:

*inputs (Tuple(Tensor)) - Tuple of input tensors with shape \((N, \ldots)\).

Outputs:

list, a list of Tensors with identical shapes as trainable weights.

Raises:: TypeError – If sens is not one of None, Tensor, Scalar or Tuple.

Supported Platforms:: Ascend GPU CPU

Examples

>>> # For a defined network Net without loss function
>>> net = Net()
>>> loss_fn = nn.SoftmaxCrossEntropyWithLogits()
>>> grad_net = nn.WithGradCell(net, loss_fn)
>>>
>>> # For a network wrapped with loss function
>>> net = Net()
>>> net_with_criterion = nn.WithLossCell(net, loss_fn)
>>> grad_net = nn.WithGradCell(net_with_criterion)

class tinyms.layers.MicroBatchInterleaved(network, interleave_num=2)[source]¶

This function splits the input at the 0th into interleave_num pieces and then performs the computation of the wrapped cell. Application scenario: When there is model parallelism in semi-automatic mode and network, if the first slice data is calculating forward, the second slice data will execute the communication operators at the same time, to achieve the performance acceleration of communication and computing concurrency.

Note

The output of the input network must be a single tensor.

Parameters:

network (Cell) – The target network to wrap.
interleave_num (int, optional) – split num of batch size. Default: 2.

Inputs:: tuple[Tensor]. It’s the same with the input of the network .
Outputs:: Tensor. The output of the input network .
Supported Platforms:: Ascend GPU

Examples

>>> net = Net()
>>> net = MicroBatchInterleaved(net, 2)

class tinyms.layers.PipelineCell(network, micro_size)[source]¶

Wrap the network with Micro Batch.

Note

micro_size must be greater or equal to pipeline stages.

Parameters:

network (Cell) – The target network to wrap.
micro_size (int) – MicroBatch size.

Supported Platforms:: Ascend GPU

Examples

>>> net = Net()
>>> net = PipelineCell(net, 4)

class tinyms.layers.WithEvalCell(network, loss_fn, add_cast_fp32=False)[source]¶

Wraps the forward network with the loss function.

It returns loss, forward output and label to calculate the metrics.

Parameters:

network (Cell) – The forward network.
loss_fn (Cell) – The loss function.
add_cast_fp32 (bool) – Whether to adjust the data type to float32. Default: False.

Inputs:

data (Tensor) - Tensor of shape \((N, \ldots)\).
label (Tensor) - Tensor of shape \((N, \ldots)\).

Outputs:

Tuple(Tensor), containing a scalar loss Tensor, a network output Tensor of shape \((N, \ldots)\) and a label Tensor of shape \((N, \ldots)\).

Raises:: TypeError – If add_cast_fp32 is not a bool.

Supported Platforms:: Ascend GPU CPU

Examples

>>> # Forward network without loss function
>>> net = Net()
>>> loss_fn = nn.SoftmaxCrossEntropyWithLogits()
>>> eval_net = nn.WithEvalCell(net, loss_fn)

class tinyms.layers.GetNextSingleOp(dataset_types, dataset_shapes, queue_name)[source]¶

Cell to run for getting the next operation.

For detailed information, refer to mindspore.ops.GetNext.

Parameters:

dataset_types (list[mindspore.dtype]) – The types of dataset.
dataset_shapes (list[tuple[int]]) – The shapes of dataset.
queue_name (str) – Queue name to fetch the data.

Outputs:: tuple[Tensor], the data get from Dataset.
Supported Platforms:: Ascend GPU

Examples

>>> import mindspore
>>> from mindspore import ops, nn
>>> from mindspore import dataset as ds
>>> from mindspore.common import dtype as mstype
>>>
>>> data_path =  "/path/to/MNIST_Data/train/"
>>> train_dataset = ds.MnistDataset(data_path, num_samples=10)
>>> dataset_helper = mindspore.DatasetHelper(train_dataset, dataset_sink_mode=True)
>>> dataset = dataset_helper.iter.dataset
>>> dataset_types, dataset_shapes = dataset_helper.types_shapes()
>>> queue_name = dataset.__transfer_dataset__.queue_name
>>> get_next_single_op_net = nn.GetNextSingleOp(dataset_types, dataset_shapes, queue_name)
>>> data, label = get_next_single_op_net()
>>> relu = ops.ReLU()
>>> result = relu(data.astype(mstype.float32))
>>> print(result.shape)
(28, 28, 1)

class tinyms.layers.TrainOneStepWithLossScaleCell(network, optimizer, scale_sense)[source]¶

Network training with loss scaling.

This is a training step with loss scaling. It takes a network, an optimizer and a scale update Cell(or a Tensor) as args. The loss scale value can be updated in both host side or device side. If you want to update it on host side, using a value of Tensor type as scale_sense, otherwise, using a Cell instance for updating loss scale as scale_sense.

Parameters:

network (Cell) – The training network. The network only supports single output.
optimizer (Cell) – Optimizer for updating the network parameters.
scale_sense (Union[Tensor, Cell]) – If this value is a Cell, it will be called by TrainOneStepWithLossScaleCell to update loss scale. If this value is a Tensor, the loss scale can be modified by set_sense_scale, the shape should be \(()\) or \((1,)\).

Inputs:

*inputs (Tuple(Tensor)) - Tuple of input tensors with shape \((N, \ldots)\).

Outputs:

Tuple of 3 Tensor, the loss, overflow flag and current loss scale value.

loss (Tensor) - A scalar, the loss value.
overflow (Tensor) - A scalar, whether overflow occur or not, the type is bool.
loss scale (Tensor) - The loss scale value, the shape is \(()\) or \((1,)\).

Raises:

TypeError – If scale_sense is neither Cell nor Tensor.
ValueError – If shape of scale_sense is neither \((1,)\) nor \(()\).

Supported Platforms:: Ascend GPU

Examples

>>> import numpy as np
>>> import mindspore
>>> from mindspore import Tensor, Parameter, nn, ops
>>> from mindspore import dtype as mstype
>>>
>>> class Net(nn.Cell):
...     def __init__(self, in_features, out_features):
...         super(Net, self).__init__()
...         self.weight = Parameter(Tensor(np.ones([in_features, out_features]).astype(np.float32)),
...                                 name='weight')
...         self.matmul = ops.MatMul()
...
...     def construct(self, x):
...         output = self.matmul(x, self.weight)
...         return output
...
>>> size, in_features, out_features = 16, 16, 10
>>> #1) when the type of scale_sense is Cell:
>>> net = Net(in_features, out_features)
>>> loss = nn.MSELoss()
>>> optimizer = nn.Momentum(net.trainable_params(), learning_rate=0.1, momentum=0.9)
>>> net_with_loss = nn.WithLossCell(net, loss)
>>> manager = nn.DynamicLossScaleUpdateCell(loss_scale_value=2**12, scale_factor=2, scale_window=1000)
>>> train_network = nn.TrainOneStepWithLossScaleCell(net_with_loss, optimizer, scale_sense=manager)
>>> input = Tensor(np.ones([out_features, in_features]), mindspore.float32)
>>> labels = Tensor(np.ones([out_features,]), mindspore.float32)
>>> output = train_network(input, labels)
>>>
>>> #2) when the type of scale_sense is Tensor:
>>> net = Net(in_features, out_features)
>>> loss = nn.MSELoss()
>>> optimizer = nn.Momentum(net.trainable_params(), learning_rate=0.1, momentum=0.9)
>>> net_with_loss = nn.WithLossCell(net, loss)
>>> inputs = Tensor(np.ones([size, in_features]).astype(np.float32))
>>> label = Tensor(np.zeros([size, out_features]).astype(np.float32))
>>> scaling_sens = Tensor([1024], dtype=mstype.float32)
>>> train_network = nn.TrainOneStepWithLossScaleCell(net_with_loss, optimizer, scale_sense=scaling_sens)
>>> output = train_network(inputs, label)
>>>
>>> # update scaling sens and train the network
>>> scaling_sens = Tensor([1], dtype=mstype.float32)
>>> train_network.set_sense_scale(scaling_sens)
>>> output = train_network(inputs, label)

get_overflow_status(status, compute_output)[source]¶

Get floating-point overflow status.

Get overflow results after executing the target process for overflow detection. User-defined training network based on this class can also call this interface to process the overflow.

Parameters:

status (object) – To control the execution sequence with start_overflow_check, it should be set as the first output of start_overflow_check.
compute_output – Overflow detection should be performed in a certain computation process. Set compute_output as the output of the computation process.

Returns:

bool, whether the overflow occurs or not.

process_loss_scale(overflow)[source]¶

Calculate loss scale according to the overflow.

User-defined training network based on this class can also call this interface to process the overflow.

Parameters:: overflow (bool) – Whether the overflow occurs or not.
Returns:: bool, the input overflow value.

set_sense_scale(sens)[source]¶

If the user has set the scale_sense of Tensor type, he can call this function to reassign the value.

Parameters:: sens (Tensor) – The new sense whose shape and type are the same with original scale_sense.

start_overflow_check(pre_cond, compute_input)[source]¶

Start floating-point overflow detection. Create and clear the overflow detection state.

Specify the argument ‘pre_cond’ and ‘compute_input’ to make sure overflow status is cleared at the right time. Taking this situation as an example, we need to execute state clearing after loss calculation and then detect overflow in the process of gradient calculation. In this case, pre_cond should be the output of the loss function, and compute_input should be the input of gradients-computing function. User-defined training network based on this class can also call this interface to process the overflow.

Parameters:

pre_cond (Tensor) – A precondition for starting overflow detection. It determines the executing order of overflow state clearing and prior processions. It makes sure that the function ‘start_overflow’ clears status after finishing the process of precondition.
compute_input (object) – The input of subsequent process. Overflow detection should be performed on a certain computation. Set compute_input as the input of the computation, to ensure overflow status is cleared before executing the computation.

Returns:

Tuple[object, object], the first output is used to control the execution sequence. To ensure that the start_overflow_check is executed before get_overflow_status after compilation optimization is performed. This value should be used as the first input of get_overflow_status. The second output is the same as the input of compute_input, used to control the execution sequence, and make ensure that the overflow flag is cleaned up when the function returns.

class tinyms.layers.DistributedGradReducer(parameters, mean=None, degree=None, fusion_type=1, group='hccl_world_group')[source]¶

A distributed optimizer.

Aggregate the gradients for all cards by using AllReduce in data parallel.

Parameters:

parameters (list) – the parameters to be updated.
mean (bool) – When mean is true, the mean coefficient (degree) would apply on gradients. When it is not specified, using the configuration gradients_mean in auto_parallel_context. Default: None.
degree (int) – The mean coefficient. Usually it equals to device number. Default: None.
fusion_type (int) – The type of all reduce fusion. Default: 1.
group (str) – The communication group to work on. Normally, the group should be created by create_group, otherwise, using the default group. Default: GlobalComm.WORLD_COMM_GROUP.

Raises:

ValueError – If degree is not an int or less than 0.

Supported Platforms:: Ascend GPU

Examples

Note

Before running the following examples, you need to configure the communication environment variables.

For the Ascend devices, users need to prepare the rank table, set rank_id and device_id. Please see the Ascend tutorial for more details.

For the GPU devices, users need to prepare the host file and mpi, please see the GPU tutorial .

This example should be run with multiple devices.

>>> import numpy as np
>>> import mindspore as ms
>>> from mindspore.communication import init
>>> from mindspore import ops
>>> from mindspore import Parameter, Tensor
>>> from mindspore import nn
>>>
>>> ms.set_context(mode=ms.GRAPH_MODE)
>>> init()
>>> ms.reset_auto_parallel_context()
>>> ms.set_auto_parallel_context(parallel_mode=ms.ParallelMode.DATA_PARALLEL)
>>>
>>> class TrainingWrapper(nn.Cell):
...     def __init__(self, network, optimizer, sens=1.0):
...         super(TrainingWrapper, self).__init__(auto_prefix=False)
...         self.network = network
...         self.network.add_flags(defer_inline=True)
...         self.weights = optimizer.parameters
...         self.optimizer = optimizer
...         self.grad = ops.GradOperation(get_by_list=True, sens_param=True)
...         self.sens = sens
...         self.reducer_flag = False
...         self.grad_reducer = None
...         self.parallel_mode = context.get_auto_parallel_context("parallel_mode")
...         self.depend = ops.Depend()
...         if self.parallel_mode in [ms.ParallelMode.DATA_PARALLEL, ms.ParallelMode.HYBRID_PARALLEL]:
...             self.reducer_flag = True
...         if self.reducer_flag:
...             mean = context.get_auto_parallel_context("gradients_mean")
...             degree = context.get_auto_parallel_context("device_num")
...             self.grad_reducer = nn.DistributedGradReducer(optimizer.parameters, mean, degree)
...
...     def construct(self, *args):
...         weights = self.weights
...         loss = self.network(*args)
...         sens = ops.Fill()(ops.DType()(loss), ops.Shape()(loss), self.sens)
...         grads = self.grad(self.network, weights)(*args, sens)
...         if self.reducer_flag:
...             # apply grad reducer on grads
...             grads = self.grad_reducer(grads)
...         return self.depend(loss, self.optimizer(grads))
>>>
>>> class Net(nn.Cell):
...     def __init__(self, in_features, out_features):
...         super(Net, self).__init__()
...         self.weight = Parameter(Tensor(np.ones([in_features, out_features]).astype(np.float32)),
...                                 name='weight')
...         self.matmul = ops.MatMul()
...
...     def construct(self, x):
...         output = self.matmul(x, self.weight)
...         return output
>>>
>>> size, in_features, out_features = 16, 16, 10
>>> network = Net(in_features, out_features)
>>> loss = nn.MSELoss()
>>> net_with_loss = nn.WithLossCell(network, loss)
>>> optimizer = nn.Momentum(net_with_loss.trainable_params(), learning_rate=0.1, momentum=0.9)
>>> train_cell = TrainingWrapper(net_with_loss, optimizer)
>>> inputs = Tensor(np.ones([size, in_features]).astype(np.float32))
>>> label = Tensor(np.zeros([size, out_features]).astype(np.float32))
>>> grads = train_cell(inputs, label)
>>> print(grads)
256.0

construct(grads)[source]¶

Under certain circumstances, the data precision of grads could be mixed with float16 and float32. Thus, the result of AllReduce is unreliable. To solve the problem, grads must be cast to float32 before AllReduce, and cast back after the operation.

Parameters:: grads (Union[Tensor, tuple[Tensor]]) – The gradient tensor or tuple before operation.
Returns:: new_grads (Union[Tensor, tuple[Tensor]]), the gradient tensor or tuple after operation.

class tinyms.layers.ParameterUpdate(param)[source]¶

Cell that updates parameter.

With this Cell, one can manually update param with the input Tensor.

Parameters:: param (Parameter) – The parameter to be updated manually.

Inputs:

x (Tensor) - A tensor whose shape and type are the same with param.

Outputs:

Tensor, the updated value.

Raises:: KeyError – If parameter with the specified name does not exist.

Supported Platforms:: Ascend GPU CPU

Examples

>>> import numpy as np
>>> import mindspore
>>> from mindspore import nn, Tensor
>>> network = nn.Dense(3, 4)
>>> param = network.parameters_dict()['weight']
>>> update = nn.ParameterUpdate(param)
>>> update.phase = "update_param"
>>> weight = Tensor(np.arange(12).reshape((4, 3)), mindspore.float32)
>>> output = update(weight)
>>> print(output)
[[ 0.  1.  2.]
 [ 3.  4.  5.]
 [ 6.  7.  8.]
 [ 9. 10. 11.]]

class tinyms.layers.DynamicLossScaleUpdateCell(loss_scale_value, scale_factor, scale_window)[source]¶

Dynamic Loss scale update cell.

For loss scaling training, the initial loss scaling value will be set to be loss_scale_value. In each training step, the loss scaling value will be decreased by loss_scale/scale_factor when there is an overflow. And it will be increased by loss_scale * scale_factor if there is no overflow for a continuous scale_window steps.

get_update_cell method of mindspore.amp.DynamicLossScaleManager will return this class. It will be called by mindspore.nn.TrainOneStepWithLossScaleCell during training to update loss scale.

Parameters:

loss_scale_value (float) – Initializes loss scale.
scale_factor (int) – Coefficient of increase and decrease.
scale_window (int) – Maximum continuous training steps that do not have overflow to increase the loss scale.

Inputs:

loss_scale (Tensor) - The loss scale value during training with shape \(()\).
overflow (bool) - Whether the overflow occurs or not.

Outputs:

bool, the input overflow.

Supported Platforms:

Ascend GPU

Examples

>>> import numpy as np
>>> from mindspore import Tensor, Parameter, nn
>>> import mindspore.ops as ops
>>>
>>> class Net(nn.Cell):
...     def __init__(self, in_features, out_features):
...         super(Net, self).__init__()
...         self.weight = Parameter(Tensor(np.ones([in_features, out_features]).astype(np.float32)),
...                                 name='weight')
...         self.matmul = ops.MatMul()
...
...     def construct(self, x):
...         output = self.matmul(x, self.weight)
...         return output
...
>>> in_features, out_features = 16, 10
>>> net = Net(in_features, out_features)
>>> loss = nn.MSELoss()
>>> optimizer = nn.Momentum(net.trainable_params(), learning_rate=0.1, momentum=0.9)
>>> net_with_loss = nn.WithLossCell(net, loss)
>>> manager = nn.DynamicLossScaleUpdateCell(loss_scale_value=2**12, scale_factor=2, scale_window=1000)
>>> train_network = nn.TrainOneStepWithLossScaleCell(net_with_loss, optimizer, scale_sense=manager)
>>> input = Tensor(np.ones([out_features, in_features]), mindspore.float32)
>>> labels = Tensor(np.ones([out_features,]), mindspore.float32)
>>> output = train_network(input, labels)

get_loss_scale()[source]¶

Get Loss Scale value.

Returns:: float, the loss scale value.

class tinyms.layers.FixedLossScaleUpdateCell(loss_scale_value)[source]¶

Update cell with fixed loss scaling value.

get_update_cell method of mindspore.amp.FixedLossScaleManager will return this class. It will be called by mindspore.nn.TrainOneStepWithLossScaleCell during trainning.

Parameters:: loss_scale_value (float) – Initializes loss scale.

Inputs:

loss_scale (Tensor) - The loss scale value during training with shape \(()\), it is ignored in this class.
overflow (bool) - Whether the overflow occurs or not.

Outputs:

bool, the input overflow.

Supported Platforms:

Ascend GPU

Examples

>>> import numpy as np
>>> from mindspore import Tensor, Parameter, nn, ops
>>>
>>> class Net(nn.Cell):
...     def __init__(self, in_features, out_features):
...         super(Net, self).__init__()
...         self.weight = Parameter(Tensor(np.ones([in_features, out_features]).astype(np.float32)),
...                                 name='weight')
...         self.matmul = ops.MatMul()
...
...     def construct(self, x):
...         output = self.matmul(x, self.weight)
...         return output
...
>>> in_features, out_features = 16, 10
>>> net = Net(in_features, out_features)
>>> loss = nn.MSELoss()
>>> optimizer = nn.Momentum(net.trainable_params(), learning_rate=0.1, momentum=0.9)
>>> net_with_loss = nn.WithLossCell(net, loss)
>>> manager = nn.FixedLossScaleUpdateCell(loss_scale_value=2**12)
>>> train_network = nn.TrainOneStepWithLossScaleCell(net_with_loss, optimizer, scale_sense=manager)
>>> input = Tensor(np.ones([out_features, in_features]), mindspore.float32)
>>> labels = Tensor(np.ones([out_features,]), mindspore.float32)
>>> output = train_network(input, labels)

get_loss_scale()[source]¶

Get Loss Scale value.

Returns:: float, the loss scale value.

class tinyms.layers.VirtualDatasetCellTriple(backbone)[source]¶

Wrap the network with virtual dataset to convert data parallel layout to model parallel layout.

VirtualDatasetCellTriple is a virtual Primitive, it does not exist in the final executing graph. Inputs and outputs of VirtualDatasetCellTriple are distributed in data parallel pattern, tensor redistribution Primitives is inserted dynamically during the graph compile process.

Note

Only used in semi auto parallel and auto parallel mode. There are three inputs, as contrary to two inputs in _VirtualDatasetCell.

Parameters:: backbone (Cell) – The target network to wrap.

Examples

>>> net = Net()
>>> net = VirtualDatasetCellTriple(net)

class tinyms.layers.Softmin(axis=-1)[source]¶

Softmin activation function, which is a two-category function mindspore.nn.Sigmoid in the promotion of multi-classification, and the purpose is to show the results of multi-classification in the form of probability.

Calculate the value of the exponential function for the elements of the input Tensor on the axis, and then normalized to lie in range [0, 1] and sum up to 1.

Softmin is defined as:

\[\text{softmin}(x_{i}) = \frac{\exp(-x_i)}{\sum_{j=0}^{n-1}\exp(-x_j)},\]

where \(x_{i}\) is the \(i\)-th slice in the given dimension of the input Tensor.

Parameters:: axis (Union[int, tuple[int]]) – The axis to apply Softmin operation, if the dimension of input x is x.ndim, the range of axis is [-x.ndim, x.ndim). -1 means the last dimension. Default: -1.

Inputs:

x (Tensor) - Tensor for computing Softmin functions with data type of float16 or float32.

Outputs:

Tensor, which has the same type and shape as x with values in the range [0,1].

Raises:

TypeError – If axis is neither an int nor a tuple.
TypeError – If dtype of x is neither float16 nor float32.
ValueError – If axis is a tuple whose length is less than 1.
ValueError – If axis is a tuple whose elements are not all in the range [-x.ndim, x.ndim).

Supported Platforms:: Ascend GPU CPU

Examples

>>> # axis = -1(default), and the sum of return value is 1.0.
>>> x = Tensor(np.array([-1, -2, 0, 2, 1]), mindspore.float16)
>>> softmin = nn.Softmin()
>>> output = softmin(x)
>>> print(output)
[0.2341  0.636  0.0862  0.01165  0.03168 ]
>>> assert(1.0 == output.sum())

class tinyms.layers.Softmax(axis=-1)[source]¶

Softmax activation function, which is a two-category function mindspore.nn.Sigmoid in the promotion of multi-classification, the purpose is to show the results of multi-classification in the form of probability.

Calculate the value of the exponential function for the elements of the input Tensor on the axis, and then normalized to lie in range [0, 1] and sum up to 1.

Softmax is defined as:

\[\text{softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_{j=0}^{n-1}\exp(x_j)},\]

where \(x_{i}\) is the \(i\)-th slice in the given dimension of the input Tensor.

Parameters:: axis (Union[int, tuple[int]]) – The axis to apply Softmax operation, if the dimension of input x is x.ndim, the range of axis is [-x.ndim, x.ndim), -1 means the last dimension. Default: -1.

Inputs:

x (Tensor) - The input of Softmax with data type of float16 or float32.

Outputs:

Tensor, which has the same type and shape as x with values in the range[0,1].

Raises:

TypeError – If axis is neither an int nor a tuple.
TypeError – If dtype of x is neither float16 nor float32.
ValueError – If axis is a tuple whose length is less than 1.
ValueError – If axis is a tuple whose elements are not all in range [-len(x), len(x)).

Supported Platforms:: Ascend GPU CPU

Examples

>>> # axis = -1(default), and the sum of return value is 1.0.
>>> x = Tensor(np.array([-1, -2, 0, 2, 1]), mindspore.float16)
>>> softmax = nn.Softmax()
>>> output = softmax(x)
>>> print(output)
[0.03168 0.01166 0.0861  0.636   0.2341 ]
>>> assert(1.0 == output.sum())

class tinyms.layers.Softmax2d[source]¶

Softmax function applied to 2D features data.

Applies Softmax to each location \((c, h, w)\) with an input Tensor of shape \((C, H, W)\) .

Inputs:

x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\) or \((C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor, which has the same type and shape as x with values in the range[0,1].

Raises:

TypeError – If dtype of x is neither float16 nor float32.
ValueError – If data_format is neither ‘NCHW’ nor ‘CHW’.

Supported Platforms:: Ascend GPU CPU

Examples

>>> x = Tensor(np.array([[[[0.1, 0.2]], [[0.3, 0.4]], [[0.6, 0.5]]]]), mindspore.float32)
>>> softmax2d = nn.Softmax2d()
>>> output = softmax2d(x)
>>> print(output)
[[[[0.258, 0.28]], [[0.316, 0.342]], [[0.426, 0.378]]]

class tinyms.layers.LogSoftmax(axis=-1)[source]¶

Applies the LogSoftmax function to n-dimensional input tensor.

The input is transformed by the Softmax function and then by the log function to lie in range[-inf,0).

Logsoftmax is defined as:

\[\text{logsoftmax}(x_i) = \log \left(\frac{\exp(x_i)}{\sum_{j=0}^{n-1} \exp(x_j)}\right),\]

Parameters:: axis (int) – The axis to apply LogSoftmax operation, -1 means the last dimension. Default: -1.

Inputs:

x (Tensor) - The input of LogSoftmax, with float16 or float32 data type.

Outputs:

Tensor, which has the same type and shape as x with output values in the range[-inf,0).

Raises:

TypeError – If axis is not an int.
TypeError – If dtype of x is neither float16 nor float32.
ValueError – If axis is not in range [-len(x), len(x)).

Supported Platforms:: Ascend GPU CPU

Examples

>>> x = Tensor(np.array([[-1.0, 4.0, -8.0], [2.0, -5.0, 9.0]]), mindspore.float32)
>>> log_softmax = nn.LogSoftmax()
>>> output = log_softmax(x)
>>> print(output)
[[-5.00672150e+00 -6.72150636e-03 -1.20067215e+01]
 [-7.00091219e+00 -1.40009127e+01 -9.12250078e-04]]

class tinyms.layers.ReLU[source]¶

Rectified Linear Unit activation function.

\[\text{ReLU}(x) = (x)^+ = \max(0, x),\]

It returns element-wise \(\max(0, x)\). Specially, the neurons with the negative output will be suppressed and the active neurons will stay the same.

The picture about ReLU looks like this ReLU .

Inputs:

x (Tensor) - The input of ReLU is a Tensor of any dimension. The data type is number .

Outputs:

Tensor, with the same type and shape as the x.

Raises:: TypeError – If dtype of x is not a number.

Supported Platforms:: Ascend GPU CPU

Examples

>>> x = Tensor(np.array([-1, 2, -3, 2, -1]), mindspore.float16)
>>> relu = nn.ReLU()
>>> output = relu(x)
>>> print(output)
[0. 2. 0. 2. 0.]

class tinyms.layers.ReLU6[source]¶

Compute ReLU6 activation function.

ReLU6 is similar to ReLU with a upper limit of 6, which if the inputs are greater than 6, the outputs will be suppressed to 6. It computes element-wise as

\[Y = \min(\max(0, x), 6).\]

The input is a Tensor of any valid shape.

Inputs:

x (Tensor) - The input of ReLU6 with data type of float16 or float32.

Outputs:

Tensor, which has the same type as x.

Raises:: TypeError – If dtype of x is neither float16 nor float32.

Supported Platforms:: Ascend GPU CPU

Examples

>>> x = Tensor(np.array([-1, -2, 0, 2, 1]), mindspore.float16)
>>> relu6 = nn.ReLU6()
>>> output = relu6(x)
>>> print(output)
[0. 0. 0. 2. 1.]

class tinyms.layers.RReLU(lower=0.125, upper=0.3333333333333333)[source]¶

Randomized Leaky ReLU activation function.

The activation function is defined as:

\[\text{RReLU}(x_{ji}) = \begin{cases}x_{ji}, &\text{if } x_{ji} \geq 0; \cr {\alpha_{ji}} * x_{ji}, &\text{otherwise.}\end{cases}\]

where \(\alpha_{ji}\) ~ \(U(l, u)\), \(l \le u\).

Applies the RReLU function elementally, as described in the paper: Empirical Evaluation of Rectified Activations in Convolution Network .

Parameters:

lower (Union[int, float]) – Slope of the activation function at x < 0. Default: 1/8.
upper (Union[int, float]) – Slope of the activation function at x < 0. Default: 1/3.

Inputs:

x (Tensor) - The input of RReLU is a Tensor of any dimension.

Outputs:

Tensor, after RReLU, has the same type and shape as the x.

Raises:

TypeError – If lower is not a float or an int.
TypeError – If upper is not a float or an int.
TypeError – If x is not a Tensor.
TypeError – If x is not a Tensor of mindspore.float16 or mindpore.float32.
ValueError – If lower is greater than upper.

Supported Platforms:: Ascend GPU CPU

Examples

>>> import mindspore
>>> import mindspore.nn as nn
>>> from mindspore import Tensor
>>> import numpy as np
>>> x = Tensor(np.array([[-1.0, 4.0], [2.0, 0]]), mindspore.float32)
>>> r_relu = nn.RReLU()
>>> output = r_relu(x)
>>> print(output)
[[-0.31465699  4.        ]
 [ 2.          0.        ]]

class tinyms.layers.SeLU[source]¶

Activation function SeLU (Scaled exponential Linear Unit).

Refer to mindspore.ops.selu() for more details.

Supported Platforms:: Ascend GPU CPU

Examples

>>> input_x = Tensor(np.array([[-1.0, 4.0, -8.0], [2.0, -5.0, 9.0]]), mindspore.float32)
>>> selu = nn.SeLU()
>>> output = selu(input_x)
>>> print(output)
[[-1.1113307 4.202804 -1.7575096]
[ 2.101402 -1.7462534 9.456309 ]]

class tinyms.layers.SiLU[source]¶

Sigmoid Linear Unit activation function.

Applies the sigmoid linear unit function element-wise.

\[\text{SiLU}(x) = x * \sigma(x),\]

where \(x_i\) is input, \(\sigma(x)\) is Sigmoid function.

\[\text{sigmoid}(x_i) = \frac{1}{1 + \exp(-x_i)},\]

The picture about SiLU looks like this SiLU .

Inputs:

x (Tensor) - Input with the data type float16 or float32.

Outputs:

Tensor, with the same type and shape as the x.

Raises:: TypeError – If dtype of x is neither float16 nor float32.

Supported Platforms:: Ascend GPU CPU

Examples

>>> x = Tensor(np.array([-1, 2, -3, 2, -1]), mindspore.float16)
>>> silu = nn.SiLU()
>>> output = silu(x)
>>> print(output)
[-0.269  1.762  -0.1423  1.762  -0.269]

class tinyms.layers.Tanh[source]¶

Applies the Tanh function element-wise, returns a new tensor with the hyperbolic tangent of the elements of input, The input is a Tensor with any valid shape.

Tanh function is defined as:

\[tanh(x_i) = \frac{\exp(x_i) - \exp(-x_i)}{\exp(x_i) + \exp(-x_i)} = \frac{\exp(2x_i) - 1}{\exp(2x_i) + 1},\]

where \(x_i\) is an element of the input Tensor.

Inputs:

x (Tensor) - Tensor of any dimension, input with data type of float16 or float32.

Outputs:

Tensor, with the same type and shape as the x.

Raises:: TypeError – If dtype of x is neither float16 nor float32.

Supported Platforms:: Ascend GPU CPU

Examples

>>> x = Tensor(np.array([1, 2, 3, 2, 1]), mindspore.float16)
>>> tanh = nn.Tanh()
>>> output = tanh(x)
>>> print(output)
[0.7617 0.964  0.995  0.964  0.7617]

class tinyms.layers.Tanhshrink[source]¶

Tanhshrink activation function.

The tanhshrink function is evaluated by element and returns a new tensor.

Tanh function is defined as:

\[tanhshrink(x_i) =x_i- \frac{\exp(x_i) - \exp(-x_i)}{\exp(x_i) + \exp(-x_i)} = x_i-\frac{\exp(2x_i) - 1}{\exp(2x_i) + 1},\]

where \(x_i\) is an element of the input Tensor.

Inputs:

x (Tensor) - Tensor of any dimension.

Outputs:

Tensor, with the same type and shape as the x.

Raises:: TypeError – If x is not a Tensor.

Supported Platforms:: Ascend GPU CPU

Examples

>>> import mindspore as ms
>>> import mindspore.nn as nn
>>> from mindspore import Tensor
>>> import numpy as np
>>> x = Tensor(np.array([1, 2, 3, 2, 1]), ms.float16)
>>> tanhshrink = nn.Tanhshrink()
>>> output = tanhshrink(x)
>>> print(output)
[0.2383 1.036  2.004  1.036  0.2383]

class tinyms.layers.Hardtanh(min_val=-1.0, max_val=1.0)[source]¶

Applies the Hardtanh function element-wise. The activation function is defined as:

\[\begin{split}\text{Hardtanh}(x) = \begin{cases} 1, & \text{ if } x > 1; \\ -1, & \text{ if } x < -1; \\ x, & \text{ otherwise. } \end{cases}\end{split}\]

Linear region range \([-1, 1]\) can be adjusted using min_val and max_val.

Note

On Ascend, data type of float16 might lead to accidental accuracy problem.

Parameters:

min_val (Union[int, float]) – Minimum value of the linear region range. Default: -1.0.
max_val (Union[int, float]) – Maximum value of the linear region range. Default: 1.0.

Inputs:

x (Tensor) - Input Tensor with data type of float16 or float32. On CPU and Ascend support dimension 0-7D. On GPU support dimension 0-4D.

Outputs:

Tensor, with the same dtype and shape as x.

Raises:

TypeError – If x is not a Tensor.
TypeError – If dtype of x is neither float16 nor float32.
TypeError – If dtype of min_val is neither float nor int.
TypeError – If dtype of max_val is neither float nor int.
ValueError – If min_val is not less than max_val.

Supported Platforms:: Ascend GPU CPU

Examples

>>> import mindspore
>>> from mindspore import Tensor, nn
>>> import numpy as np
>>> x = Tensor(np.array([-1, -2, 0, 2, 1]), mindspore.float16)
>>> hardtanh = nn.Hardtanh(min_val=-1.0, max_val=1.0)
>>> output = hardtanh(x)
>>> print(output)
[-1. -1.  0.  1.  1.]

class tinyms.layers.GELU(approximate=True)[source]¶

Gaussian error linear unit activation function.

Applies GELU function to each element of the input. The input is a Tensor with any valid shape.

GELU is defined as:

\[GELU(x_i) = x_i*P(X < x_i),\]

where \(P\) is the cumulative distribution function of standard Gaussian distribution and \(x_i\) is the element of the input.

The picture about GELU looks like this GELU.

Parameters:

approximate (bool) –

Whether to enable approximation. Default: True.

If approximate is True, The gaussian error linear activation is:

\(0.5 * x * (1 + tanh(\sqrt(2 / \pi) * (x + 0.044715 * x^3)))\)

else, it is:

\(x * P(X <= x) = 0.5 * x * (1 + erf(x / \sqrt(2)))\), where P(X) ~ N(0, 1).

Inputs:

x (Tensor) - The input of GELU with data type of float16 or float32. The shape is \((N,*)\) where \(*\) means, any number of additional dimensions.

Outputs:

Tensor, with the same type and shape as the x.

Raises:: TypeError – If dtype of x is neither float16 nor float32.

Supported Platforms:: Ascend GPU CPU

Examples

>>> x = Tensor(np.array([[-1.0, 4.0, -8.0], [2.0, -5.0, 9.0]]), mindspore.float32)
>>> gelu = nn.GELU()
>>> output = gelu(x)
>>> print(output)
[[-1.5880802e-01  3.9999299e+00 -3.1077917e-21]
 [ 1.9545976e+00 -2.2918017e-07  9.0000000e+00]]
>>> gelu = nn.GELU(approximate=False)
>>> # CPU not support "approximate=False", using "approximate=True" instead
>>> output = gelu(x)
>>> print(output)
[[-1.5865526e-01  3.9998732e+00 -0.0000000e+00]
 [ 1.9544997e+00 -1.4901161e-06  9.0000000e+00]]

class tinyms.layers.FastGelu[source]¶

Fast Gaussian error linear unit activation function.

Applies FastGelu function to each element of the input. The input is a Tensor with any valid shape.

FastGelu is defined as:

\[FastGelu(x_i) = \frac {x_i} {1 + \exp(-1.702 * \left| x_i \right|)} * \exp(0.851 * (x_i - \left| x_i \right|))\]

where \(x_i\) is the element of the input.

Inputs:

x (Tensor) - The input of FastGelu with data type of float16 or float32. The shape is \((N,*)\) where \(*\) means, any number of additional dimensions.

Outputs:

Tensor, with the same type and shape as the x.

Raises:: TypeError – If dtype of x is neither float16 nor float32.

Supported Platforms:: Ascend GPU CPU

Examples

>>> import mindspore
>>> from mindspore import Tensor, nn
>>> import numpy as np
>>> x = Tensor(np.array([[-1.0, 4.0, -8.0], [2.0, -5.0, 9.0]]), mindspore.float32)
>>> fast_gelu = nn.FastGelu()
>>> output = fast_gelu(x)
>>> print(output)
[[-1.5418735e-01  3.9921875e+00 -9.7473649e-06]
 [ 1.9375000e+00 -1.0052517e-03  8.9824219e+00]]

class tinyms.layers.Sigmoid[source]¶

Sigmoid activation function.

Applies sigmoid-type activation element-wise.

Sigmoid function is defined as:

\[\text{sigmoid}(x_i) = \frac{1}{1 + \exp(-x_i)},\]

where \(x_i\) is the element of the input.

The picture about Sigmoid looks like this Sigmoid.

Inputs:

input_x (Tensor) - The input of Sigmoid with data type of float16 or float32. Tensor of any dimension.

Outputs:

Tensor, with the same type and shape as the input_x.

Raises:: TypeError – If dtype of input_x is neither float16 nor float32.

Supported Platforms:: Ascend GPU CPU

Examples

>>> x = Tensor(np.array([-1, -2, 0, 2, 1]), mindspore.float16)
>>> sigmoid = nn.Sigmoid()
>>> output = sigmoid(x)
>>> print(output)
[0.2688  0.11914 0.5     0.881   0.7305 ]

class tinyms.layers.Softsign[source]¶

Softsign activation function.

Refer to mindspore.ops.softsign() for more details.

Supported Platforms:: Ascend GPU CPU

Examples

>>> x = Tensor(np.array([0, -1, 2, 30, -30]), mindspore.float32)
>>> softsign = nn.Softsign()
>>> output = softsign(x)
>>> print(output)
[ 0.        -0.5         0.6666667  0.9677419 -0.9677419]

class tinyms.layers.PReLU(channel=1, w=0.25)[source]¶

PReLU activation function.

Applies the PReLU function element-wise.

PReLU is defined as:

\[PReLU(x_i)= \max(0, x_i) + w * \min(0, x_i),\]

where \(x_i\) is an element of an channel of the input.

Here \(w\) is a learnable parameter with a default initial value 0.25. Parameter \(w\) has dimensionality of the argument channel. If called without argument channel, a single parameter \(w\) will be shared across all channels.

The picture about PReLU looks like this PReLU.

Parameters:

channel (int) – The elements number of parameter w. It could be an int, and the value is 1 or the channels number of input tensor x. Default: 1.
w (Union[float, list, Tensor]) – The initial value of parameter. It could be a float, a float list or a tensor has the same dtype as the input tensor x. Default: 0.25.

Inputs:

x (Tensor) - The input of PReLU with data type of float16 or float32. The shape is \((N, *)\) where \(*\) means, any number of additional dimensions.

Outputs:

Tensor, with the same dtype and shape as the x.

Raises:

TypeError – If channel is not an int.
TypeError – If w is not one of a float, a float list, a float Tensor.
TypeError – If dtype of x is neither float16 nor float32.
ValueError – If the x is a 0-D or 1-D Tensor on Ascend.
ValueError – If channel is less than 1.

Supported Platforms:: Ascend GPU CPU

Examples

>>> x = Tensor(np.array([[[[0.1, 0.6], [0.9, 0.9]]]]), mindspore.float32)
>>> prelu = nn.PReLU()
>>> output = prelu(x)
>>> print(output)
[[[[0.1 0.6]
   [0.9 0.9]]]]

tinyms.layers.get_activation(name, prim_name=None)[source]¶

Gets the activation function.

Parameters:

name (str) – The name of the activation function.
prim_name (Union[str, None]) – The name of primitive. Default: None.

Returns:

Function, the activation function.

Supported Platforms:: Ascend GPU CPU

Examples

>>> sigmoid = nn.get_activation('sigmoid')
>>> print(sigmoid)
Sigmoid<>

class tinyms.layers.LeakyReLU(alpha=0.2)[source]¶

Leaky ReLU activation function.

The activation function is defined as:

\[\text{leaky_relu}(x) = \begin{cases}x, &\text{if } x \geq 0; \cr {\alpha} * x, &\text{otherwise.}\end{cases}\]

where \(\alpha\) represents the alpha parameter.

For more details, see Rectifier Nonlinearities Improve Neural Network Acoustic Models.

Parameters:: alpha (Union[int, float]) – Slope of the activation function at x < 0. Default: 0.2.

Inputs:

x (Tensor) - The input of LeakyReLU is a Tensor of any dimension.

Outputs:

Tensor, has the same type and shape as the x.

Raises:: TypeError – If alpha is not a float or an int.

Supported Platforms:: Ascend GPU CPU

Examples

>>> x = Tensor(np.array([[-1.0, 4.0, -8.0], [2.0, -5.0, 9.0]]), mindspore.float32)
>>> leaky_relu = nn.LeakyReLU()
>>> output = leaky_relu(x)
>>> print(output)
[[-0.2  4.  -1.6]
 [ 2.  -1.   9. ]]

class tinyms.layers.HSigmoid[source]¶

Hard sigmoid activation function. Calculates the output according to the input elements.

Hard sigmoid is defined as:

\[\text{hsigmoid}(x_{i}) = max(0, min(1, \frac{x_{i} + 3}{6})),\]

Inputs:

input_x (Tensor) - The input of HSigmoid. Tensor of any dimension.

Outputs:

Tensor, with the same type and shape as the input_x.

Raises:: TypeError – If input_x is not a Tensor.

Supported Platforms:: Ascend GPU CPU

Examples

>>> x = Tensor(np.array([-1, -2, 0, 2, 1]), mindspore.float16)
>>> hsigmoid = nn.HSigmoid()
>>> result = hsigmoid(x)
>>> print(result)
[0.3333 0.1666 0.5    0.8335 0.6665]

class tinyms.layers.HSwish[source]¶

Applies hswish-type activation element-wise. The input is a Tensor with any valid shape.

Hard swish is defined as:

\[\text{hswish}(x_{i}) = x_{i} * \frac{ReLU6(x_{i} + 3)}{6},\]

Inputs:

x (Tensor) - The input of HSwish, data type must be float16 or float32. The shape is \((N,*)\) where \(*\) means, any number of additional dimensions.

Outputs:

Tensor, with the same type and shape as the x.

Raises:: TypeError – If dtype of x is neither float16 nor float32.

Supported Platforms:: Ascend GPU CPU

Examples

>>> x = Tensor(np.array([-1, -2, 0, 2, 1]), mindspore.float16)
>>> hswish = nn.HSwish()
>>> result = hswish(x)
>>> print(result)
[-0.3333 -0.3333  0.      1.667   0.6665]

class tinyms.layers.ELU(alpha=1.0)[source]¶

Exponential Linear Unit activation function.

Applies the exponential linear unit function element-wise. The activation function is defined as:

\[E_{i} = \begin{cases} x_i, &\text{if } x_i \geq 0; \cr \alpha * (\exp(x_i) - 1), &\text{otherwise.} \end{cases}\]

where \(x_i\) represents the element of the input and \(\alpha\) represents the alpha parameter.

The picture about ELU looks like this ELU.

Parameters:: alpha (float) – The alpha value of ELU, the data type is float. Default: 1.0.

Inputs:

x (Tensor) - The input of ELU is a Tensor of any dimension with data type of float16 or float32.

Outputs:

Tensor, with the same type and shape as the x.

Raises:

TypeError – If alpha is not a float.
TypeError – If dtype of x is neither float16 nor float32.
ValueError – If alpha is not equal to 1.0.

Supported Platforms:: Ascend GPU CPU

Examples

>>> x = Tensor(np.array([-1, -2, 0, 2, 1]), mindspore.float32)
>>> elu = nn.ELU()
>>> result = elu(x)
>>> print(result)
[-0.63212055  -0.86466473  0.  2.  1.]

class tinyms.layers.LogSigmoid[source]¶

Applies logsigmoid activation element-wise. The input is a Tensor with any valid shape.

Logsigmoid is defined as:

\[\text{logsigmoid}(x_{i}) = log(\frac{1}{1 + \exp(-x_i)}),\]

where \(x_{i}\) is the element of the input.

Inputs:

x (Tensor) - The input of LogSigmoid with data type of float16 or float32. The shape is \((N,*)\) where \(*\) means, any number of additional dimensions.

Outputs:

Tensor, with the same type and shape as the x.

Raises:: TypeError – If dtype of x is neither float16 nor float32.

Supported Platforms:: Ascend GPU

Examples

>>> net = nn.LogSigmoid()
>>> x = Tensor(np.array([1.0, 2.0, 3.0]), mindspore.float32)
>>> output = net(x)
>>> print(output)
[-0.31326166 -0.12692806 -0.04858734]

class tinyms.layers.LRN(depth_radius=5, bias=1.0, alpha=1.0, beta=0.5, norm_region='ACROSS_CHANNELS')[source]¶

Local Response Normalization.

Refer to mindspore.ops.lrn() for more details.

Supported Platforms:: Ascend GPU CPU

Examples

>>> input_x = Tensor(np.array([[[[0.1], [0.2]],
...                       [[0.3], [0.4]]]]), mindspore.float32)
>>> output = nn.LRN()(input_x)
>>> print(output)
[[[[0.09534626]
   [0.1825742 ]]
  [[0.2860388 ]
   [0.3651484 ]]]]

class tinyms.layers.SoftShrink(lambd=0.5)[source]¶

Applies the SoftShrink function element-wise.

\[\begin{split}\text{SoftShrink}(x) = \begin{cases} x - \lambda, & \text{ if } x > \lambda \\ x + \lambda, & \text{ if } x < -\lambda \\ 0, & \text{ otherwise } \end{cases}\end{split}\]

Parameters:: lambd (float) – the \(\lambda\) must be no less than zero for the SoftShrink formulation. Default: 0.5.

Inputs:

input_x (Tensor) - The input of SoftShrink with data type of float16 or float32. Any number of additional dimensions.

Outputs:

Tensor, has the same shape and data type as input_x.

Raises:

TypeError – If lambd is not a float.
TypeError – If input_x is not a Tensor.
TypeError – If dtype of input_x is neither float16 nor float32.
ValueError – If lambd is less than 0.

Supported Platforms:: Ascend GPU CPU

Examples

>>> input_x = Tensor(np.array([[ 0.5297,  0.7871,  1.1754], [ 0.7836,  0.6218, -1.1542]]), mstype.float16)
>>> softshrink = nn.SoftShrink()
>>> output = softshrink(input_x)
>>> print(output)
[[ 0.02979  0.287    0.676  ]
 [ 0.2837   0.1216  -0.6543 ]]

class tinyms.layers.HShrink(lambd=0.5)[source]¶

Hard Shrink activation function. Calculates the output according to the input elements.

The formula is defined as follows:

\[\begin{split}\text{HardShrink}(x) = \begin{cases} x, & \text{ if } x > \lambda \\ x, & \text{ if } x < -\lambda \\ 0, & \text{ otherwise } \end{cases}\end{split}\]

Parameters:: lambd (float) – The threshold \(\lambda\) defined by the Hard Shrink formula. Default: 0.5.

Inputs:

input_x (Tensor) - The input of Hard Shrink with data type of float16 or float32.

Outputs:

Tensor, the same shape and data type as the input.

Raises:

TypeError – If lambd is not a float.
TypeError – If dtype of input_x is neither float16 nor float32.

Supported Platforms:: Ascend GPU CPU

Examples

>>> import mindspore
>>> from mindspore import Tensor, nn
>>> import numpy as np
>>> input_x = Tensor(np.array([[ 0.5,  1,  2.0], [0.0533,0.0776,-2.1233]]), mindspore.float32)
>>> hshrink = nn.HShrink()
>>> output = hshrink(input_x)
>>> print(output)
[[ 0.      1.      2.    ]
[ 0.      0.     -2.1233]]

class tinyms.layers.CELU(alpha=1.0)[source]¶

Continuously differentiable exponential linear units activation function.

Applies the continuously differentiable exponential linear units function element-wise.

\[\text{CELU}(x) = \max(0,x) + \min(0, \alpha * (\exp(x/\alpha) - 1))\]

The picture about CELU looks like this CELU.

Parameters:: alpha (float) – The \(\alpha\) value for the Celu formulation. Default: 1.0

Inputs:

x (Tensor) - The input of CELU. The required dtype is float16 or float32. The shape is \((N,*)\) where \(*\) means, any number of additional dimensions.

Outputs:

Tensor, with the same type and shape as the x.

Raises:

TypeError – If alpha is not a float.
ValueError – If alpha has the value of 0.
TypeError – If x is not a Tensor.
TypeError – If the dtype of ‘input_x’ is neither float16 nor float32.

Supported Platforms:: Ascend GPU CPU

Examples

>>> x = Tensor(np.array([-2.0, -1.0, 1.0, 2.0]), mindspore.float32)
>>> celu = nn.CELU()
>>> output = celu(x)
>>> print(output)
[-0.86466473 -0.63212055  1.          2.        ]

class tinyms.layers.Threshold(threshold, value)[source]¶

Thresholds each element of the input Tensor.

The formula is defined as follows:

\[\begin{split}y = \begin{cases} x, &\text{ if } x > \text{threshold} \\ \text{value}, &\text{ otherwise } \end{cases}\end{split}\]

Parameters:

threshold (Union[int, float]) – The value to threshold at.
value (Union[int, float]) – The value to replace with when element is less than threshold.

Inputs:

input_x (Tensor) - The input of Threshold with data type of float16 or float32.

Outputs:

Tensor, the same shape and data type as the input.

Raises:

TypeError – If threshold is not a float or an int.
TypeError – If value is not a float or an int.

Supported Platforms:: Ascend GPU CPU

Examples

>>> import mindspore
>>> import mindspore.nn as nn
>>> m = nn.Threshold(0.1, 20)
>>> inputs = mindspore.Tensor([0.1, 0.2, 0.3], mindspore.float32)
>>> outputs = m(inputs)
>>> print(outputs)
[ 20.0     0.2      0.3]

class tinyms.layers.Mish[source]¶

Computes MISH(A Self Regularized Non-Monotonic Neural Activation Function) of input tensors element-wise.

Refer to mindspore.ops.mish() for more details.

Supported Platforms:: Ascend GPU CPU

Examples

>>> x = Tensor(np.array([[-1.0, 4.0, -8.0], [2.0, -5.0, 9.0]]), mindspore.float32)
>>> mish = nn.Mish()
>>> output = mish(x)
>>> print(output)
[[-0.3034014  3.9974129 -0.0026832]
 [ 1.9439590  -0.0033576 9.0000000]]

class tinyms.layers.GLU(axis=-1)[source]¶

The gated linear unit function.

\[{GLU}(a, b)= a \otimes \sigma(b)\]

where \(a\) is the first half of the input matrices and \(b\) is the second half.

Here \(\sigma\) is the sigmoid function, and \(\otimes\) is the Hadamard product.

Parameters:: axis (int) – the axis to split the input. Default: -1, the last axis in x.

Inputs:

x (Tensor) - \((\ast_1, N, \ast_2)\) where * means, any number of additional dimensions.

Outputs:

Tensor, the same dtype as the x, with the shape \((\ast_1, M, \ast_2)\) where \(M=N/2\).

Supported Platforms:

Ascend GPU CPU

Examples

>>> m = nn.GLU()
>>> input = Tensor([[0.1,0.2,0.3,0.4],[0.5,0.6,0.7,0.8]])
>>> output = m(input)
>>> print(output)
[[0.05744425 0.11973753]
 [0.33409387 0.41398472]]

class tinyms.layers.BatchNorm1d(num_features, eps=1e-05, momentum=0.9, affine=True, gamma_init='ones', beta_init='zeros', moving_mean_init='zeros', moving_var_init='ones', use_batch_statistics=None, data_format='NCHW')[source]¶

This layer applies Batch Normalization over a 2D or 3D input (a mini-batch of 1D or 2D inputs) to reduce internal covariate shift. Batch Normalization is widely used in convolutional networks. For the setailed contents, refer to Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. It rescales and recenters the feature using a mini-batch of data and the learned parameters which can be described in the following formula.

\[y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]

Note

The implementation of BatchNorm is different in graph mode and pynative mode, therefore the mode is not recommended to be changed after net was initialized.

Parameters:

num_features (int) – number of features or channels C of the input x .
eps (float) – \(\epsilon\) added to the denominator for numerical stability. Default: 1e-5.
momentum (float) – A floating hyperparameter of the momentum for the running_mean and running_var computation. Default: 0.9.
affine (bool) – A bool value. When set to True, \(\gamma\) and \(\beta\) can be learned. Default: True.
gamma_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the \(\gamma\) weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, etc. Default: ‘ones’.
beta_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the \(\beta\) weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, etc. Default: ‘zeros’.
moving_mean_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the moving mean. The values of str refer to the function initializer including ‘zeros’, ‘ones’, etc. Default: ‘zeros’.
moving_var_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the moving variance. The values of str refer to the function initializer including ‘zeros’, ‘ones’, etc. Default: ‘ones’.
use_batch_statistics (bool) – If true, use the mean value and variance value of current batch data. If false, use the mean value and variance value of specified value. If None, the training process will use the mean and variance of current batch data and track the running mean and variance, the evaluation process will use the running mean and variance. Default: None.
data_format (str) – The optional value for data format, is ‘NHWC’ or ‘NCHW’. Default: ‘NCHW’.

Inputs:

x (Tensor) - Tensor of shape \((N, C)\) or \((N, C, L)\) , where N is the batch size, C is the number of features or channels, and L is the sequence length.

Outputs:

Tensor, the normalized, scaled, offset tensor, of shape \((N, C)\) or \((N, C, L)\) .

Raises:

TypeError – If num_features is not an int.
TypeError – If eps is not a float.
ValueError – If num_features is less than 1.
ValueError – If momentum is not in range [0, 1].

Supported Platforms:: Ascend GPU CPU

Examples

>>> import numpy as np
>>> import mindspore.nn as nn
>>> from mindspore import Tensor
>>> net = nn.BatchNorm1d(num_features=4)
>>> x = Tensor(np.array([[0.7, 0.5, 0.5, 0.6],
...                      [0.5, 0.4, 0.6, 0.9]]).astype(np.float32))
>>> output = net(x)
>>> print(output)
[[ 0.6999965   0.4999975  0.4999975  0.59999704 ]
 [ 0.4999975   0.399998   0.59999704 0.89999545 ]]

class tinyms.layers.BatchNorm2d(num_features, eps=1e-05, momentum=0.9, affine=True, gamma_init='ones', beta_init='zeros', moving_mean_init='zeros', moving_var_init='ones', use_batch_statistics=None, data_format='NCHW')[source]¶

Batch Normalization is widely used in convolutional networks. This layer applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) to avoid internal covariate shift as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. It rescales and recenters the feature using a mini-batch of data and the learned parameters which can be described in the following formula.

\[y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]

Note

The implementation of BatchNorm is different in graph mode and pynative mode, therefore that mode can not be changed after net was initialized. Note that the formula for updating the \(moving\_mean\) and \(moving\_var\) is

\[\begin{split}\text{moving_mean}=\text{moving_mean*momentum}+μ_β\text{*(1−momentum)}\\ \text{moving_var}=\text{moving_var*momentum}+σ^2_β\text{*(1−momentum)}\end{split}\]

where \(moving\_mean\) is the updated mean, \(moving\_var\) is the updated variance, \(μ_β, σ^2_β\) are the observed value (mean and variance) of each batch of data.

Parameters:

num_features (int) – The number of channels of the input tensor. Expected input size is \((N, C, H, W)\), C represents the number of channels.
eps (float) – \(\epsilon\) added to the denominator for numerical stability. Default: 1e-5.
momentum (float) – A floating hyperparameter of the momentum for the running_mean and running_var computation. Default: 0.9.
affine (bool) – A bool value. When set to True, \(\gamma\) and \(\beta\) can be learned. Default: True.
gamma_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the \(\gamma\) weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, etc. Default: ‘ones’.
beta_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the \(\beta\) weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, etc. Default: ‘zeros’.
moving_mean_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the moving mean. The values of str refer to the function initializer including ‘zeros’, ‘ones’, etc. Default: ‘zeros’.
moving_var_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the moving variance. The values of str refer to the function initializer including ‘zeros’, ‘ones’, etc. Default: ‘ones’.
use_batch_statistics (bool) –
- If true, use the mean value and variance value of current batch data and track running mean and running variance.
- If false, use the mean value and variance value of specified value, and not track statistical value.
- If None, the use_batch_statistics is automatically set to true or false according to the training and evaluation mode. During training, the parameter is set to true, and during evaluation, the parameter is set to false. Default: None.
data_format (str) – The optional value for data format, is ‘NHWC’ or ‘NCHW’. Default: ‘NCHW’.

Inputs:

x (Tensor) - Tensor of shape \((N, C, H, W)\).

Outputs:

Tensor, the normalized, scaled, offset tensor, of shape \((N, C, H, W)\).

Raises:

TypeError – If num_features is not an int.
TypeError – If eps is not a float.
ValueError – If num_features is less than 1.
ValueError – If momentum is not in range [0, 1].
ValueError – If data_format is neither ‘NHWC’ not ‘NCHW’.

Supported Platforms:: Ascend GPU CPU

Examples

>>> import numpy as np
>>> import mindspore.nn as nn
>>> from mindspore import Tensor
>>> net = nn.BatchNorm2d(num_features=3)
>>> x = Tensor(np.ones([1, 3, 2, 2]).astype(np.float32))
>>> output = net(x)
>>> print(output)
[[[[ 0.999995 0.999995 ]
   [ 0.999995 0.999995 ]]
  [[ 0.999995 0.999995 ]
   [ 0.999995 0.999995 ]]
  [[ 0.999995 0.999995 ]
   [ 0.999995 0.999995 ]]]]

class tinyms.layers.BatchNorm3d(num_features, eps=1e-05, momentum=0.9, affine=True, gamma_init='ones', beta_init='zeros', moving_mean_init='zeros', moving_var_init='ones', use_batch_statistics=None)[source]¶

Batch Normalization is widely used in convolutional networks. This layer applies Batch Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) to avoid internal covariate shift.

\[y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]

Note

The implementation of BatchNorm is different in graph mode and pynative mode, therefore that mode can not be changed after net was initialized. Note that the formula for updating the running_mean and running_var is \(\hat{x}_\text{new} = (1 - \text{momentum}) \times x_t + \text{momentum} \times \hat{x}\), where \(\hat{x}\) is the estimated statistic and \(x_t\) is the new observed value.

Parameters:

num_features (int) – C from an expected input of size \((N, C, D, H, W)\) .
eps (float) – A value added to the denominator for numerical stability. Default: 1e-5.
momentum (float) – A floating hyperparameter of the momentum for the running_mean and running_var computation. Default: 0.9.
affine (bool) – A bool value. When set to True, gamma and beta can be learned. Default: True.
gamma_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the gamma weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, etc. Default: ‘ones’.
beta_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the beta weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, etc. Default: ‘zeros’.
moving_mean_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the moving mean. The values of str refer to the function initializer including ‘zeros’, ‘ones’, etc. Default: ‘zeros’.
moving_var_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the moving variance. The values of str refer to the function initializer including ‘zeros’, ‘ones’, etc. Default: ‘ones’.
use_batch_statistics (bool) – If true, use the mean value and variance value of current batch data. If false, use the mean value and variance value of specified value. If None, the training process will use the mean and variance of current batch data and track the running mean and variance, the evaluation process will use the running mean and variance. Default: None.

Inputs:

x (Tensor) - Tensor of shape \((N, C_{in}, D_{in}, H_{in}, W_{in})\).

Outputs:

Tensor, the normalized, scaled, offset tensor, of shape \((N, C_{out}, D_{out},H_{out}, W_{out})\).

Raises:

TypeError – If num_features is not an int.
TypeError – If eps is not a float.
ValueError – If num_features is less than 1.
ValueError – If momentum is not in range [0, 1].

Supported Platforms:: Ascend GPU CPU

Examples

>>> import numpy as np
>>> import mindspore.nn as nn
>>> from mindspore import Tensor
>>> net = nn.BatchNorm3d(num_features=3)
>>> x = Tensor(np.ones([16, 3, 10, 32, 32]).astype(np.float32))
>>> output = net(x)
>>> print(output.shape)
(16, 3, 10, 32, 32)

class tinyms.layers.LayerNorm(normalized_shape, begin_norm_axis=-1, begin_params_axis=-1, gamma_init='ones', beta_init='zeros', epsilon=1e-07)[source]¶

Applies Layer Normalization over a mini-batch of inputs.

Layer Normalization is widely used in recurrent neural networks. It applies normalization on a mini-batch of inputs for each single training case as described in the paper Layer Normalization. Unlike Batch Normalization, Layer Normalization performs exactly the same computation at training and testing time. It is applied across all channels and pixel but only one batch size. It can be described using the following formula:

\[y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]

Parameters:

normalized_shape (Union(tuple[int], list[int])) – The normalization is performed over axis begin_norm_axis … R - 1.
begin_norm_axis (int) – The first normalization dimension: normalization will be performed along dimensions begin_norm_axis: rank(inputs), the value should be in [-1, rank(input)). Default: -1.
begin_params_axis (int) – The first parameter(beta, gamma)dimension: scale and centering parameters will have dimensions begin_params_axis: rank(inputs) and will be broadcast with the normalized inputs accordingly, the value should be in [-1, rank(input)). Default: -1.
gamma_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the \(\gamma\) weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘ones’.
beta_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the \(\beta\) weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘zeros’.
epsilon (float) – \(\epsilon\) added to the denominator for numerical stability. Default: 1e-7.

Inputs:

x (Tensor) - The shape of x is \((x_1, x_2, ..., x_R)\), and input_shape[begin_norm_axis:] is equal to normalized_shape.

Outputs:

Tensor, the normalized and scaled offset tensor, has the same shape and data type as the x.

Raises:

TypeError – If normalized_shape is neither a list nor tuple.
TypeError – If begin_norm_axis or begin_params_axis is not an int.
TypeError – If epsilon is not a float.

Supported Platforms:: Ascend GPU CPU

Examples

>>> x = Tensor(np.ones([20, 5, 10, 10]), mindspore.float32)
>>> shape1 = x.shape[1:]
>>> m = nn.LayerNorm(shape1,  begin_norm_axis=1, begin_params_axis=1)
>>> output = m(x).shape
>>> print(output)
(20, 5, 10, 10)

class tinyms.layers.GroupNorm(num_groups, num_channels, eps=1e-05, affine=True, gamma_init='ones', beta_init='zeros')[source]¶

Group Normalization over a mini-batch of inputs.

Group Normalization is widely used in recurrent neural networks. It applies normalization on a mini-batch of inputs for each single training case as described in the paper Group Normalization. Group Normalization divides the channels into groups and computes within each group the mean and variance for normalization, and it performs very stable over a wide range of batch size. It can be described using the following formula:

\[y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]

Parameters:

num_groups (int) – The number of groups to be divided along the channel dimension.
num_channels (int) – The number of input channels.
eps (float) – A value added to the denominator for numerical stability. Default: 1e-5.
affine (bool) – A bool value, this layer will have learnable affine parameters when set to true. Default: True.
gamma_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the gamma weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘ones’. If gamma_init is a Tensor, the shape must be \((num\_channels)\).
beta_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the beta weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘zeros’. If beta_init is a Tensor, the shape must be \((num\_channels)\).

Inputs:

x (Tensor) - The input feature with shape \((N, C, H, W)\) .

Outputs:

Tensor, the normalized and scaled offset tensor, has the same shape and data type as the x.

Raises:

TypeError – If num_groups or num_channels is not an int.
TypeError – If eps is not a float.
TypeError – If affine is not a bool.
ValueError – If num_groups or num_channels is less than 1.
ValueError – If num_channels is not divided by num_groups.

Supported Platforms:: Ascend GPU CPU

Examples

>>> group_norm_op = nn.GroupNorm(2, 2)
>>> x = Tensor(np.ones([1, 2, 4, 4], np.float32))
>>> output = group_norm_op(x)
>>> print(output)
[[[[0. 0. 0. 0.]
   [0. 0. 0. 0.]
   [0. 0. 0. 0.]
   [0. 0. 0. 0.]]
  [[0. 0. 0. 0.]
   [0. 0. 0. 0.]
   [0. 0. 0. 0.]
   [0. 0. 0. 0.]]]]

class tinyms.layers.SyncBatchNorm(num_features, eps=1e-05, momentum=0.9, affine=True, gamma_init='ones', beta_init='zeros', moving_mean_init='zeros', moving_var_init='ones', use_batch_statistics=None, process_groups=None)[source]¶

Sync Batch Normalization layer over a N-dimension input.

Sync Batch Normalization is cross device synchronized Batch Normalization. The implementation of Batch Normalization only normalizes the data within each device. Sync Batch Normalization will normalize the input within the group. It has been described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. It rescales and recenters the feature using a mini-batch of data and the learned parameters which can be described in the following formula.

\[y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]

Note

Currently, SyncBatchNorm only supports 2D and 4D inputs.

Parameters:

num_features (int) – C from an expected input of size \((N, C, H, W)\).
eps (float) – \(\epsilon\), a value added to the denominator for numerical stability. Default: 1e-5.
momentum (float) – A floating hyperparameter of the momentum for the running_mean and running_var computation. Default: 0.9.
affine (bool) – A bool value. When set to True, \(\gamma\) and \(\beta\) can be learned. Default: True.
gamma_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the \(\gamma\) weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘ones’.
beta_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the \(\beta\) weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘zeros’.
moving_mean_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the moving mean. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘zeros’.
moving_var_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the moving variance. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘ones’.
use_batch_statistics (bool) – If true, use the mean value and variance value of current batch data. If false, use the mean value and variance value of specified value. If None, training process will use the mean and variance of current batch data and track the running mean and variance, eval process will use the running mean and variance. Default: None.
process_groups (list) – A list to divide devices into different sync groups, containing N subtraction lists. Each subtraction list contains int numbers identifying rank ids which need to be synchronized in the same group. All int values must be in [0, rank_size) and different from each other. Default: None, indicating synchronization across all devices.

Inputs:

x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor, the normalized, scaled, offset tensor, of shape \((N, C_{out}, H_{out}, W_{out})\).

Raises:

TypeError – If num_features is not an int.
TypeError – If eps is not a float.
TypeError – If process_groups is not a list.
ValueError – If num_features is less than 1.
ValueError – If momentum is not in range [0, 1].
ValueError – If rank_id in process_groups is not in range [0, rank_size).

Supported Platforms:: Ascend

Examples

Note

Before running the following examples, you need to configure the communication environment variables.

For the Ascend devices, users need to prepare the rank table, set rank_id and device_id. Please see the Ascend tutorial for more details.

For the GPU devices, users need to prepare the host file and mpi, please see the GPU tutorial .

This example should be run with multiple devices.

>>> import numpy as np
>>> import mindspore as ms
>>> from mindspore.communication import init
>>> from mindspore import Tensor
>>> from mindspore import nn
>>> from mindspore import dtype as mstype
>>>
>>> ms.set_context(mode=ms.GRAPH_MODE)
>>> init()
>>> ms.reset_auto_parallel_context()
>>> ms.set_auto_parallel_context(parallel_mode=ms.ParallelMode.DATA_PARALLEL)
>>> sync_bn_op = nn.SyncBatchNorm(num_features=3, process_groups=[[0, 1], [2, 3]])
>>> x = Tensor(np.ones([1, 3, 2, 2]), mstype.float32)
>>> output = sync_bn_op(x)
>>> print(output)
[[[[ 0.999995 0.999995 ]
   [ 0.999995 0.999995 ]]
  [[ 0.999995 0.999995 ]
   [ 0.999995 0.999995 ]]
  [[ 0.999995 0.999995 ]
   [ 0.999995 0.999995 ]]]]

class tinyms.layers.InstanceNorm1d(num_features, eps=1e-05, momentum=0.1, affine=True, gamma_init='ones', beta_init='zeros')[source]¶

This layer applies Instance Normalization over a 3D input (a mini-batch of 1D inputs with additional channel dimension). Refer to the paper Instance Normalization: The Missing Ingredient for Fast Stylization. It rescales and recenters the feature using a mini-batch of data and the learned parameters which can be described in the following formula.

\[y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]

The size of \(\gamma\) and \(\beta\), learnable parameters vectors, is num_features if affine is True. The standard-deviation is calculated via the biased estimator.

This layer uses instance statistics computed from input data in both training and evaluation modes.

InstanceNorm1d and BatchNorm1d are very similar, but have some differences. InstanceNorm1d is applied on each channel of channeled data like RGB images, but BatchNorm1d is usually applied on each batch of batched data.

Note

Note that the formula for updating the running_mean and running_var is \(\hat{x}_\text{new} = (1 - \text{momentum}) \times x_t + \text{momentum} \times \hat{x}\), where \(\hat{x}\) is the estimated statistic and \(x_t\) is the new observed value.

Parameters:

num_features (int) – C from an expected input of size \((N, C, L)\).
eps (float) – A value added to the denominator for numerical stability. Default: 1e-5.
momentum (float) – A floating hyperparameter of the momentum for the running_mean and running_var computation. Default: 0.1.
affine (bool) – A bool value. When set to True, gamma and beta can be learned. Default: True.
gamma_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the gamma weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, etc. When initialized with Tensor, the shape should be \((C)\). Default: ‘zeros’.
beta_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the beta weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, etc. When initialized with Tensor, the shape should be \((C)\). Default: ‘zeros’.

Inputs:

x (Tensor) - Tensor of shape \((N, C, L)\). Data type: float16 or float32.

Outputs:

Tensor, the normalized, scaled, offset tensor, of shape \((N, C, L)\). Same type and shape as the x.

Raises:

TypeError – If the type of num_features is not int.
TypeError – If the type of eps is not float.
TypeError – If the type of momentum is not float.
TypeError – If the type of affine is not bool.
TypeError – If the type of gamma_init/beta_init is not same, or if the initialized element type is not float32.
ValueError – If num_features is less than 1.
ValueError – If momentum is not in range [0, 1].
ValueError – If the shape of gamma_init / beta_init is not \((C)\).
KeyError – If any of gamma_init/beta_init is str and the homonymous class inheriting from Initializer not exists.

Supported Platforms:: GPU

Examples

>>> import mindspore
>>> import numpy as np
>>> import mindspore.nn as nn
>>> from mindspore import Tensor
>>> net = nn.InstanceNorm1d(3)
>>> x = Tensor(np.ones([2, 3, 5]), mindspore.float32)
>>> output = net(x)
>>> print(output.shape)
(2, 3, 5)

class tinyms.layers.InstanceNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, gamma_init='ones', beta_init='zeros')[source]¶

This layer applies Instance Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension). Refer to the paper Instance Normalization: The Missing Ingredient for Fast Stylization. It rescales and recenters the feature using a mini-batch of data and the learned parameters which can be described in the following formula.

\[y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]

\(\gamma\) and \(\beta\) are learnable parameter vectors of size num_features if affine is True. The standard-deviation is calculated via the biased estimator.

This layer uses instance statistics computed from input data in both training and evaluation modes.

InstanceNorm2d and BatchNorm2d are very similar, but have some differences. InstanceNorm2d is applied on each channel of channeled data like RGB images, but BatchNorm2d is usually applied on each batch of batched data.

Note

Note that the formula for updating the running_mean and running_var is \(\hat{x}_\text{new} = (1 - \text{momentum}) \times x_t + \text{momentum} \times \hat{x}\), where \(\hat{x}\) is the estimated statistic and \(x_t\) is the new observed value.

Parameters:

num_features (int) – C from an expected input of size \((N, C, H, W)\).
eps (float) – A value added to the denominator for numerical stability. Default: 1e-5.
momentum (float) – A floating hyperparameter of the momentum for the running_mean and running_var computation. Default: 0.1.
affine (bool) – A bool value. When set to True, gamma and beta can be learned. Default: True.
gamma_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the gamma weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, etc. When initialized with Tensor, the shape should be \((C)\). Default: ‘zeros’.
beta_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the beta weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, etc. When initialized with Tensor, the shape should be \((C)\). Default: ‘zeros’.

Inputs:

x (Tensor) - Tensor of shape \((N, C, H, W)\). Data type: float16 or float32.

Outputs:

Tensor, the normalized, scaled, offset tensor, of shape \((N, C, H, W)\). Same type and shape as the x.

Raises:

TypeError – If the type of num_features is not int.
TypeError – If the type of eps is not float.
TypeError – If the type of momentum is not float.
TypeError – If the type of affine is not bool.
TypeError – If the type of gamma_init/beta_init is not same, or if the initialized element type is not float32.
ValueError – If num_features is less than 1.
ValueError – If momentum is not in range [0, 1].
ValueError – If the shape of gamma_init / beta_init is not \((C)\).
KeyError – If any of gamma_init/beta_init is str and the homonymous class inheriting from Initializer not exists.

Supported Platforms:: GPU

Examples

>>> import mindspore
>>> import numpy as np
>>> import mindspore.nn as nn
>>> from mindspore import Tensor
>>> net = nn.InstanceNorm2d(3)
>>> x = Tensor(np.ones([2, 3, 2, 2]), mindspore.float32)
>>> output = net(x)
>>> print(output.shape)
(2, 3, 2, 2)

class tinyms.layers.InstanceNorm3d(num_features, eps=1e-05, momentum=0.1, affine=True, gamma_init='ones', beta_init='zeros')[source]¶

This layer applies Instance Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension). Refer to the paper Instance Normalization: The Missing Ingredient for Fast Stylization. It rescales and recenters the feature using a mini-batch of data and the learned parameters which can be described in the following formula.

\[y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]

\(\gamma\) and \(\beta\) are learnable parameter vectors of size num_features if affine is True. The standard-deviation is calculated via the biased estimator.

This layer uses instance statistics computed from input data in both training and evaluation modes.

InstanceNorm3d and BatchNorm3d are very similar, but have some differences. InstanceNorm3d is applied on each channel of channeled data like RGB images, but BatchNorm3d is usually applied on each batch of batched data.

Note

Note that the formula for updating the running_mean and running_var is \(\hat{x}_\text{new} = (1 - \text{momentum}) \times x_t + \text{momentum} \times \hat{x}\), where \(\hat{x}\) is the estimated statistic and \(x_t\) is the new observed value.

Parameters:

num_features (int) – C from an expected input of size \((N, C, D, H, W)\).
eps (float) – A value added to the denominator for numerical stability. Default: 1e-5.
momentum (float) – A floating hyperparameter of the momentum for the running_mean and running_var computation. Default: 0.1.
affine (bool) – A bool value. When set to True, gamma and beta can be learned. Default: True.
gamma_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the gamma weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, etc. When initialized with Tensor, the shape should be \((C)\). Default: ‘zeros’.
beta_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the beta weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, etc. When initialized with Tensor, the shape should be \((C)\). Default: ‘zeros’.

Inputs:

x (Tensor) - Tensor of shape \((N, C, D, H, W)\). Data type: float16 or float32.

Outputs:

Tensor, the normalized, scaled, offset tensor, of shape \((N, C, D, H, W)\). Same type and shape as the x.

Raises:

TypeError – If the type of num_features is not int.
TypeError – If the type of eps is not float.
TypeError – If the type of momentum is not float.
TypeError – If the type of affine is not bool.
TypeError – If the type of gamma_init/beta_init is not same, or if the initialized element type is not float32.
ValueError – If num_features is less than 1.
ValueError – If momentum is not in range [0, 1].
ValueError – If the shape of gamma_init / beta_init is not \((C)\).
KeyError – If any of gamma_init/beta_init is str and the homonymous class inheriting from Initializer not exists.

Supported Platforms:: GPU

Examples

>>> import mindspore
>>> import numpy as np
>>> import mindspore.nn as nn
>>> from mindspore import Tensor
>>> net = nn.InstanceNorm3d(3)
>>> x = Tensor(np.ones([2, 3, 5, 2, 2]), mindspore.float32)
>>> output = net(x)
>>> print(output.shape)
(2, 3, 5, 2, 2)

class tinyms.layers.SequentialCell(*args)[source]¶

Sequential Cell container. For more details about Cell, please refer to Cell.

A list of Cells will be added to it in the order they are passed in the constructor. Alternatively, an ordered dict of cells can also be passed in.

Note

SequentialCell and torch.nn.ModuleList are different, ModuleList is a list for storing modules. However, the layers in a Sequential are connected in a cascading way.

Parameters:: args (list, OrderedDict) – List or OrderedDict of subclass of Cell.

Inputs:

x (Tensor) - Tensor with shape according to the first Cell in the sequence.

Outputs:

Tensor, the output Tensor with shape depending on the input x and defined sequence of Cells.

Raises:: TypeError – If the type of the args is not list or OrderedDict.

Supported Platforms:: Ascend GPU CPU

Examples

>>> from mindspore import Tensor
>>> import mindspore
>>> import mindspore.nn as nn
>>> import numpy as np
>>>
>>> conv = nn.Conv2d(3, 2, 3, pad_mode='valid', weight_init="ones")
>>> relu = nn.ReLU()
>>> seq = nn.SequentialCell([conv, relu])
>>> x = Tensor(np.ones([1, 3, 4, 4]), dtype = mindspore.float32)
>>> output = seq(x)
>>> print(output)
[[[[27. 27.]
   [27. 27.]]
  [[27. 27.]
   [27. 27.]]]]
>>> from collections import OrderedDict
>>> d = OrderedDict()
>>> d["conv"] = conv
>>> d["relu"] = relu
>>> seq = nn.SequentialCell(d)
>>> x = Tensor(np.ones([1, 3, 4, 4]), dtype=mindspore.float32)
>>> output = seq(x)
>>> print(output)
[[[[27. 27.]
   [27. 27.]]
  [[27. 27.]
   [27. 27.]]]]

append(cell)[source]¶

Appends a given Cell to the end of the list.

Parameters:: cell (Cell) – The Cell to be appended.

Examples

>>> from mindspore import Tensor
>>> import mindspore
>>> import mindspore.nn as nn
>>> import numpy as np
>>>
>>> conv = nn.Conv2d(3, 2, 3, pad_mode='valid', weight_init="ones")
>>> bn = nn.BatchNorm2d(2)
>>> relu = nn.ReLU()
>>> seq = nn.SequentialCell([conv, bn])
>>> seq.append(relu)
>>> x = Tensor(np.ones([1, 3, 4, 4]), dtype=mindspore.float32)
>>> output = seq(x)
>>> print(output)
[[[[26.999863 26.999863]
   [26.999863 26.999863]]
  [[26.999863 26.999863]
   [26.999863 26.999863]]]]

class tinyms.layers.CellList(*args, **kwargs)[source]¶

Holds Cells in a list. For more details about Cell, please refer to Cell.

CellList can be used like a regular Python list, the Cells it contains have been initialized. Unlike the SequentialCell, the cells in CellList are not connected.

Parameters:: args (list, optional) – List of subclass of Cell.

Supported Platforms:: Ascend GPU CPU

Examples

>>> import mindspore.nn as nn
>>> import mindspore as ms
>>> import numpy as np
>>>
>>> conv = nn.Conv2d(100, 20, 3)
>>> bn = nn.BatchNorm2d(20)
>>> relu = nn.ReLU()
>>> cell_ls = nn.CellList([bn])
>>> cell_ls.insert(0, conv)
>>> cell_ls.append(relu)
>>> cell_ls.extend([relu, relu])
>>> cell_ls_3 = cell_ls[3]
>>> input1 = ms.Tensor(np.ones([2, 3]), ms.float32)
>>> output = cell_ls_3(input1)
>>> print(output)
[[1. 1. 1.]
[1. 1. 1.]]

append(cell)[source]¶

Appends a given Cell to the end of the list.

Parameters:: cell (Cell) – The subcell to be appended.

extend(cells)[source]¶

Appends Cells from a Python iterable to the end of the list.

Parameters:: cells (list) – The Cells to be extended.
Raises:: TypeError – If the argument cells are not a list of Cells.

insert(index, cell)[source]¶

Inserts a given Cell before a given index in the list.

Parameters:

index (int) – The Insert index in the CellList.
cell (Cell) – The Cell to be inserted.

class tinyms.layers.Conv2d(in_channels, out_channels, kernel_size, stride=1, pad_mode='same', padding=0, dilation=1, group=1, has_bias=False, weight_init='normal', bias_init='zeros', data_format='NCHW')[source]¶

Calculates the 2D convolution on the input tensor. The input is typically of shape \((N, C_{in}, H_{in}, W_{in})\), where \(N\) is batch size, \(C_{in}\) is a number of channels, \(H_{in}, W_{in}\) are the height and width of the feature layer respectively. For the tensor of each batch, its shape is \((C_{in}, H_{in}, W_{in})\), the formula is defined as:

\[\text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) + \sum_{k = 0}^{C_{in} - 1} \text{ccor}({\text{weight}(C_{\text{out}_j}, k), \text{X}(N_i, k)})\]

where \(ccor\) is the cross-correlation, \(C_{in}\) is the channel number of the input, \(out_{j}\) corresponds to the \(j\)-th channel of the output and \(j\) is in the range of \([0, C_{out}-1]\). \(\text{weight}(C_{\text{out}_j}, k)\) is a convolution kernel slice with shape \((\text{kernel_size[0]}, \text{kernel_size[1]})\), where \(\text{kernel_size[0]}\) and \(\text{kernel_size[1]}\) are the height and width of the convolution kernel respectively. \(\text{bias}\) is the bias parameter and \(\text{X}\) is the input tensor. In this case, data_format of the input tensor is ‘NCHW’ and the shape of full convolution kernel is \((C_{out}, C_{in} / \text{group}, \text{kernel_size[0]}, \text{kernel_size[1]})\), where group is the number of groups to split the input x in the channel dimension. If data_format of the input tensor is ‘NHWC’, the shape of full convolution kernel will be \((C_{out}, \text{kernel_size[0]}, \text{kernel_size[1]}), C_{in} / \text{group}\).

For more details, please refers to the paper Gradient Based Learning Applied to Document Recognition.

Note

On Ascend platform, only group convolution in depthwise convolution scenarios is supported. That is, when group>1, condition in_channels = out_channels = group must be satisfied.

Parameters:

in_channels (int) – The channel number of the input tensor of the Conv2d layer.
out_channels (int) – The channel number of the output tensor of the Conv2d layer.
kernel_size (Union[int, tuple[int]]) – Specifies the height and width of the 2D convolution kernel. The data type is an integer or a tuple of two integers. An integer represents the height and width of the convolution kernel. A tuple of two integers represents the height and width of the convolution kernel respectively.
stride (Union[int, tuple[int]]) – The movement stride of the 2D convolution kernel. The data type is an integer or a tuple of two integers. An integer represents the movement step size in both height and width directions. A tuple of two integers represents the movement step size in the height and width directions respectively. Default: 1.
pad_mode (str) –
Specifies padding mode. The optional values are “same”, “valid”, “pad”. Default: “same”.
- same: The width of the output is the same as the value of the input divided by stride. If this mode is set, the value of padding must be 0.
- valid: Returns a valid calculated output without padding. Excess pixels that do not satisfy the calculation will be discarded. If this mode is set, the value of padding must be 0.
- pad: Pads the input. Padding padding size of zero on both sides of the input. If this mode is set, the value of padding must be greater than or equal to 0.
padding (Union[int, tuple[int]]) – The number of padding on the height and width directions of the input. The data type is an integer or a tuple of four integers. If padding is an integer, then the top, bottom, left, and right padding are all equal to padding. If padding is a tuple of 4 integers, then the top, bottom, left, and right padding is equal to padding[0], padding[1], padding[2], and padding[3] respectively. The value should be greater than or equal to 0. Default: 0.
dilation (Union[int, tuple[int]]) – Dilation size of 2D convolution kernel. The data type is an integer or a tuple of two integers. If \(k > 1\), the kernel is sampled every k elements. The value of k on the height and width directions is in range of [1, H] and [1, W] respectively. Default: 1.
group (int) – Splits filter into groups, in_channels and out_channels must be divisible by group. If the group is equal to in_channels and out_channels, this 2D convolution layer also can be called 2D depthwise convolution layer. Default: 1.
has_bias (bool) – Whether the Conv2d layer has a bias parameter. Default: False.
weight_init (Union[Tensor, str, Initializer, numbers.Number]) – Initialization method of weight parameter. It can be a Tensor, a string, an Initializer or a numbers.Number. When a string is specified, values from ‘TruncatedNormal’, ‘Normal’, ‘Uniform’, ‘HeUniform’ and ‘XavierUniform’ distributions as well as constant ‘One’ and ‘Zero’ distributions are possible. Alias ‘xavier_uniform’, ‘he_uniform’, ‘ones’ and ‘zeros’ are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of Initializer for more details. Default: ‘normal’.
bias_init (Union[Tensor, str, Initializer, numbers.Number]) – Initialization method of bias parameter. Available initialization methods are the same as ‘weight_init’. Refer to the values of Initializer for more details. Default: ‘zeros’.
data_format (str) – The optional value for data format, is ‘NHWC’ or ‘NCHW’. Default: ‘NCHW’.

Inputs:

x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\) or \((N, H_{in}, W_{in}, C_{in})\).

Outputs:

Tensor of shape \((N, C_{out}, H_{out}, W_{out})\) or \((N, H_{out}, W_{out}, C_{out})\).

pad_mode is ‘same’:

\[\begin{split}\begin{array}{ll} \\ H_{out} = \left \lceil{\frac{H_{in}}{\text{stride[0]}}} \right \rceil \\ W_{out} = \left \lceil{\frac{W_{in}}{\text{stride[1]}}} \right \rceil \\ \end{array}\end{split}\]

pad_mode is ‘valid’:

\[\begin{split}\begin{array}{ll} \\ H_{out} = \left \lceil{\frac{H_{in} - \text{dilation[0]} \times (\text{kernel_size[0]} - 1) } {\text{stride[0]}}} \right \rceil \\ W_{out} = \left \lceil{\frac{W_{in} - \text{dilation[1]} \times (\text{kernel_size[1]} - 1) } {\text{stride[1]}}} \right \rceil \\ \end{array}\end{split}\]

pad_mode is ‘pad’:

\[\begin{split}\begin{array}{ll} \\ H_{out} = \left \lfloor{\frac{H_{in} + padding[0] + padding[1] - (\text{kernel_size[0]} - 1) \times \text{dilation[0]} - 1 }{\text{stride[0]}} + 1} \right \rfloor \\ W_{out} = \left \lfloor{\frac{W_{in} + padding[2] + padding[3] - (\text{kernel_size[1]} - 1) \times \text{dilation[1]} - 1 }{\text{stride[1]}} + 1} \right \rfloor \\ \end{array}\end{split}\]

Raises:

TypeError – If in_channels, out_channels or group is not an int.
TypeError – If kernel_size, stride, padding or dilation is neither an int not a tuple.
ValueError – If in_channels, out_channels, kernel_size, stride or dilation is less than 1.
ValueError – If padding is less than 0.
ValueError – If pad_mode is not one of ‘same’, ‘valid’, ‘pad’.
ValueError – If padding is a tuple whose length is not equal to 4.
ValueError – If pad_mode is not equal to ‘pad’ and padding is not equal to (0, 0, 0, 0).
ValueError – If data_format is neither ‘NCHW’ not ‘NHWC’.

Supported Platforms:: Ascend GPU CPU

Examples

>>> net = nn.Conv2d(120, 240, 4, has_bias=False, weight_init='normal')
>>> x = Tensor(np.ones([1, 120, 1024, 640]), mindspore.float32)
>>> output = net(x).shape
>>> print(output)
(1, 240, 1024, 640)

class tinyms.layers.Conv2dTranspose(in_channels, out_channels, kernel_size, stride=1, pad_mode='same', padding=0, output_padding=0, dilation=1, group=1, has_bias=False, weight_init='normal', bias_init='zeros')[source]¶

Calculates a 2D transposed convolution, which can be regarded as Conv2d for the gradient of the input, also called deconvolution (although it is not an actual deconvolution).

The input is typically of shape \((N, C_{in}, H_{in}, W_{in})\), where \(N\) is batch size, \(C_{in}\) is space dimension, \(H_{in}, W_{in}\) are the height and width of the feature layer respectively.

When Conv2d and Conv2dTranspose are initialized with the same parameters, and pad_mode is set to ‘pad’, \(dilation * (kernel\_size - 1) - padding\) amount of zero will be paded to the height and width directions of the input, they are inverses of each other in regard to the input and output shapes in this case. However, when stride > 1, Conv2d maps multiple input shapes to the same output shape. Deconvolutional network can refer to Deconvolutional Networks.

Parameters:

in_channels (int) – The channel number of the input tensor of the Conv2dTranspose layer.
out_channels (int) – The channel number of the output tensor of the Conv2dTranspose layer.
kernel_size (Union[int, tuple[int]]) – Specifies the height and width of the 2D convolution kernel. The data type is an integer or a tuple of two integers. An integer represents the height and width of the convolution kernel. A tuple of two integers represents the height and width of the convolution kernel respectively.
stride (Union[int, tuple[int]]) – The movement stride of the 2D convolution kernel. The data type is an integer or a tuple of two integers. An integer represents the movement step size in both height and width directions. A tuple of two integers represents the movement step size in the height and width directions respectively. Default: 1.
pad_mode (str) –
Specifies padding mode. The optional values are “same”, “valid”, “pad”. Default: “same”.
- same: The width of the output is the same as the value of the input divided by stride. If this mode is set, the value of padding must be 0.
- valid: Returns a valid calculated output without padding. Excess pixels that do not satisfy the calculation will be discarded. If this mode is set, the value of padding must be 0.
- pad: Pads the input. Padding padding size of zero on both sides of the input. If this mode is set, the value of padding must be greater than or equal to 0.
padding (Union[int, tuple[int]]) – The number of padding on the height and width directions of the input. The data type is an integer or a tuple of four integers. If padding is an integer, then the top, bottom, left, and right padding are all equal to padding. If padding is a tuple of 4 integers, then the top, bottom, left, and right padding is equal to padding[0], padding[1], padding[2], and padding[3] respectively. The value should be greater than or equal to 0. Default: 0.
output_padding (Union[int, tuple[int]]) – The number of padding on the height and width directions of the output. The data type is an integer or a tuple of two integers. If output_padding is an integer, then the bottom and right padding are all equal to output_padding. If output_padding is a tuple of 2 integers, then the bottom and right padding is equal to output_padding[0], output_padding[1] respectively. If output_padding is not equal to 0, pad_mode must be pad. The value should be in range of [0, max(stride, dilation)) . Default: 0.
dilation (Union[int, tuple[int]]) – Dilation size of 2D convolution kernel. The data type is an integer or a tuple of two integers. If \(k > 1\), the kernel is sampled every k elements. The value of k on the height and width directions is in range of [1, H] and [1, W] respectively. Default: 1.
group (int) – Splits filter into groups, in_channels and out_channels must be divisible by group. Default: 1.
has_bias (bool) – Whether the Conv2dTranspose layer has a bias parameter. Default: False.
weight_init (Union[Tensor, str, Initializer, numbers.Number]) – Initialization method of weight parameter. It can be a Tensor, a string, an Initializer or a numbers.Number. When a string is specified, values from ‘TruncatedNormal’, ‘Normal’, ‘Uniform’, ‘HeUniform’ and ‘XavierUniform’ distributions as well as constant ‘One’ and ‘Zero’ distributions are possible. Alias ‘xavier_uniform’, ‘he_uniform’, ‘ones’ and ‘zeros’ are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of Initializer for more details. Default: ‘normal’.
bias_init (Union[Tensor, str, Initializer, numbers.Number]) – Initialization method of bias parameter. Available initialization methods are the same as ‘weight_init’. Refer to the values of Initializer for more details. Default: ‘zeros’.

Inputs:

x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, C_{out}, H_{out}, W_{out})\).

pad_mode is ‘same’:

\[\begin{split}\begin{array}{ll} \\ H_{out} = \text H_{in}\times \text {stride[0]} \\ W_{out} = \text W_{in}\times \text {stride[1]} \\ \end{array}\end{split}\]

pad_mode is ‘valid’:

\[\begin{split}\begin{array}{ll} \\ H_{out} = \text H_{in}\times \text {stride[0]} + \max\{(\text{dilation[0]} - 1) \times (\text{kernel_size[0]} - 1) - \text {stride[0]}, 0 \} \\ W_{out} = \text W_{in}\times \text {stride[1]} + \max\{(\text{dilation[1]} - 1) \times (\text{kernel_size[1]} - 1) - \text {stride[1]}, 0 \} \\ \end{array}\end{split}\]

pad_mode is ‘pad’:

\[\begin{split}\begin{array}{ll} \\ H_{out} = \text H_{in}\times \text {stride[0]} - (padding[0] + padding[1]) + \text{kernel_size[0]} + (\text{dilation[0]} - 1) \times (\text{kernel_size[0]} - 1) - \text {stride[0]} + \text {output_padding[0]} \\ W_{out} = \text W_{in}\times \text {stride[1]} - (padding[2] + padding[3]) + \text{kernel_size[1]} + (\text{dilation[1]} - 1) \times (\text{kernel_size[1]} - 1) - \text {stride[1]} + \text {output_padding[1]} \\ \end{array}\end{split}\]

Raises:

TypeError – If in_channels, out_channels or group is not an int.
TypeError – If kernel_size, stride, padding or dilation is neither an int not a tuple.
ValueError – If in_channels, out_channels, kernel_size, stride or dilation is less than 1.
ValueError – If padding is less than 0.
ValueError – If pad_mode is not one of ‘same’, ‘valid’, ‘pad’.
ValueError – If padding is a tuple whose length is not equal to 4.
ValueError – If pad_mode is not equal to ‘pad’ and padding is not equal to (0, 0, 0, 0).

Supported Platforms:: Ascend GPU CPU

Examples

>>> net = nn.Conv2dTranspose(3, 64, 4, has_bias=False, weight_init='normal', pad_mode='pad')
>>> x = Tensor(np.ones([1, 3, 16, 50]), mindspore.float32)
>>> output = net(x).shape
>>> print(output)
(1, 64, 19, 53)

class tinyms.layers.Conv1d(in_channels, out_channels, kernel_size, stride=1, pad_mode='same', padding=0, dilation=1, group=1, has_bias=False, weight_init='normal', bias_init='zeros')[source]¶

Calculates the 1D convolution on the input tensor. The input is typically of shape \((N, C_{in}, L_{in})\), where \(N\) is batch size, \(C_{in}\) is a number of channels and \(L_{in}\) is a length of sequence. For the tensor of each batch, its shape is \((C_{in}, L_{in})\), and the formula is defined as:

\[\text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) + \sum_{k = 0}^{C_{in} - 1} \text{ccor}({\text{weight}(C_{\text{out}_j}, k), \text{X}(N_i, k)})\]

where \(ccor\) is the cross-correlation, \(C_{in}\) is the channel number of the input, \(out_{j}\) corresponds to the \(j\)-th channel of the output and \(j\) is in the range of \([0, C_{out}-1]\). \(\text{weight}(C_{\text{out}_j}, k)\) is a convolution kernel slice with shape \(\text{kernel_size}\), where \(\text{kernel_size}\) is the width of the convolution kernel. \(\text{bias}\) is the bias parameter, and \(\text{X}\) is the input tensor. The shape of full convolution kernel is \((C_{out}, C_{in} / \text{group}, \text{kernel_size})\), where group is the number of groups to split the input x in the channel dimension.

For more details, please refers to the paper Gradient Based Learning Applied to Document Recognition.

Note

On Ascend platform, only group convolution in depthwise convolution scenarios is supported. That is, when group>1, condition in_channels = out_channels = group must be satisfied.

Parameters:

in_channels (int) – The channel number of the input tensor of the Conv1d layer.
out_channels (int) – The channel number of the output tensor of the Conv1d layer.
kernel_size (int) – Specifies the width of the 1D convolution kernel.
stride (int) – The movement stride of the 1D convolution kernel. Default: 1.
pad_mode (str) –
Specifies padding mode. The optional values are “same”, “valid”, “pad”. Default: “same”.
- same: The width of the output is the same as the value of the input divided by stride. If this mode is set, the value of padding must be 0.
- valid: Returns a valid calculated output without padding. Excess pixels that do not satisfy the calculation will be discarded. If this mode is set, the value of padding must be 0.
- pad: Pads the input. Padding padding size of zero on both sides of the input. If this mode is set, the value of padding must be greater than or equal to 0.
padding (int) – The number of padding on both sides of input. The value should be greater than or equal to 0. Default: 0.
dilation (int) – Dilation size of 1D convolution kernel. If \(k > 1\), the kernel is sampled every k elements. The value of k is in range of [1, L]. Default: 1.
group (int) – Splits filter into groups, in_channels and out_channels must be divisible by group. Default: 1.
has_bias (bool) – Whether the Conv1d layer has a bias parameter. Default: False.
weight_init (Union[Tensor, str, Initializer, numbers.Number]) – Initialization method of weight parameter. It can be a Tensor, a string, an Initializer or a numbers.Number. When a string is specified, values from ‘TruncatedNormal’, ‘Normal’, ‘Uniform’, ‘HeUniform’ and ‘XavierUniform’ distributions as well as constant ‘One’ and ‘Zero’ distributions are possible. Alias ‘xavier_uniform’, ‘he_uniform’, ‘ones’ and ‘zeros’ are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of Initializer for more details. Default: ‘normal’.
bias_init (Union[Tensor, str, Initializer, numbers.Number]) – Initialization method of bias parameter. Available initialization methods are the same as ‘weight_init’. Refer to the values of Initializer for more details. Default: ‘zeros’.

Inputs:

x (Tensor) - Tensor of shape \((N, C_{in}, L_{in})\).

Outputs:

Tensor of shape \((N, C_{out}, L_{out})\).

pad_mode is ‘same’:

\[L_{out} = \left \lceil{\frac{L_{in}}{\text{stride}}} \right \rceil\]

pad_mode is ‘valid’:

\[L_{out} = \left \lceil{\frac{L_{in} - \text{dilation} \times (\text{kernel_size} - 1) } {\text{stride}}} \right \rceil\]

pad_mode is ‘pad’:

\[L_{out} = \left \lfloor{\frac{L_{in} + 2 \times padding - (\text{kernel_size} - 1) \times \text{dilation} - 1 }{\text{stride}} + 1} \right \rfloor\]

Raises:

TypeError – If in_channels, out_channels, kernel_size, stride, padding or dilation is not an int.
ValueError – If in_channels, out_channels, kernel_size, stride or dilation is less than 1.
ValueError – If padding is less than 0.
ValueError – If pad_mode is not one of ‘same’, ‘valid’, ‘pad’.

Supported Platforms:: Ascend GPU CPU

Examples

>>> net = nn.Conv1d(120, 240, 4, has_bias=False, weight_init='normal')
>>> x = Tensor(np.ones([1, 120, 640]), mindspore.float32)
>>> output = net(x).shape
>>> print(output)
(1, 240, 640)

class tinyms.layers.Conv1dTranspose(in_channels, out_channels, kernel_size, stride=1, pad_mode='same', padding=0, dilation=1, group=1, has_bias=False, weight_init='normal', bias_init='zeros')[source]¶

Calculates a 1D transposed convolution, which can be regarded as Conv1d for the gradient of the input, also called deconvolution (although it is not an actual deconvolution).

The input is typically of shape \((N, C_{in}, L_{in})\), where \(N\) is batch size, \(C\) is a number of channels and \(L_{in}\) is a length of sequence.

When Conv1d and ConvTranspose1d are initialized with the same parameters, and pad_mode is set to ‘pad’, \(dilation * (kernel\_size - 1) - padding\) amount of zero will be paded to both sizes of input, they are inverses of each other in regard to the input and output shapes in this case. However, when stride > 1, Conv1d maps multiple input shapes to the same output shape. Deconvolutional network can refer to Deconvolutional Networks.

Parameters:

in_channels (int) – The channel number of the input tensor of the Conv1dTranspose layer.
out_channels (int) – The channel number of the output tensor of the Conv1dTranspose layer.
kernel_size (int) – Specifies the width of the 1D convolution kernel.
stride (int) – The movement stride of the 1D convolution kernel. Default: 1.
pad_mode (str) –
Specifies padding mode. The optional values are “same”, “valid”, “pad”. Default: “same”.
- same: The width of the output is the same as the value of the input divided by stride. If this mode is set, the value of padding must be 0.
- valid: Returns a valid calculated output without padding. Excess pixels that do not satisfy the calculation will be discarded. If this mode is set, the value of padding must be 0.
- pad: Pads the input. Padding padding size of zero on both sides of the input. If this mode is set, the value of padding must be greater than or equal to 0.
padding (int) – The number of padding on both sides of input. The value should be greater than or equal to 0. Default: 0.
dilation (int) – Dilation size of 1D convolution kernel. If \(k > 1\), the kernel is sampled every k elements. The value of k is in range of [1, L]. Default: 1.
group (int) – Splits filter into groups, in_channels and out_channels must be divisible by group. When group > 1, the Ascend platform is not supported yet. Default: 1.
has_bias (bool) – Whether the Conv1dTranspose layer has a bias parameter. Default: False.
weight_init (Union[Tensor, str, Initializer, numbers.Number]) – Initialization method of weight parameter. It can be a Tensor, a string, an Initializer or a numbers.Number. When a string is specified, values from ‘TruncatedNormal’, ‘Normal’, ‘Uniform’, ‘HeUniform’ and ‘XavierUniform’ distributions as well as constant ‘One’ and ‘Zero’ distributions are possible. Alias ‘xavier_uniform’, ‘he_uniform’, ‘ones’ and ‘zeros’ are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of Initializer for more details. Default: ‘normal’.
bias_init (Union[Tensor, str, Initializer, numbers.Number]) – Initialization method of bias parameter. Available initialization methods are the same as ‘weight_init’. Refer to the values of Initializer for more details. Default: ‘zeros’.

Inputs:

x (Tensor) - Tensor of shape \((N, C_{in}, L_{in})\).

Outputs:

Tensor of shape \((N, C_{out}, L_{out})\).

pad_mode is ‘same’:

\[L_{out} = \left \lfloor{\frac{L_{in}}{\text{stride}} + 1} \right \rfloor\]

pad_mode is ‘valid’:

\[L_{out} = \left \lfloor{\frac{L_{in} - \text{dilation} \times (\text{kernel_size} - 1) } {\text{stride}} + 1} \right \rfloor\]

pad_mode is ‘pad’:

\[L_{out} = \left \lfloor{\frac{L_{in} + 2 \times padding - (\text{dilation} - 1) \times \text{kernel_size} - 1 }{\text{stride}} + 1} \right \rfloor\]

Raises:

TypeError – If in_channels, out_channels, kernel_size, stride, padding or dilation is not an int.
ValueError – If in_channels, out_channels, kernel_size, stride or dilation is less than 1.
ValueError – If padding is less than 0.
ValueError – If pad_mode is not one of ‘same’, ‘valid’, ‘pad’.

Supported Platforms:: Ascend GPU CPU

Examples

>>> net = nn.Conv1dTranspose(3, 64, 4, has_bias=False, weight_init='normal', pad_mode='pad')
>>> x = Tensor(np.ones([1, 3, 50]), mindspore.float32)
>>> output = net(x).shape
>>> print(output)
(1, 64, 53)

class tinyms.layers.Conv3d(in_channels, out_channels, kernel_size, stride=1, pad_mode='same', padding=0, dilation=1, group=1, has_bias=False, weight_init='normal', bias_init='zeros', data_format='NCDHW')[source]¶

Calculates the 3D convolution on the input tensor. The input is typically of shape \((N, C_{in}, D_{in}, H_{in}, W_{in})\), where \(N\) is batch size, \(C_{in}\) is a number of channels, \(D_{in}, H_{in}, W_{in}\) are the depth, height and width of the feature layer respectively. For the tensor of each batch, its shape is \((C_{in}, D_{in}, H_{in}, W_{in})\), the formula is defined as:

\[\text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) + \sum_{k = 0}^{C_{in} - 1} \text{ccor}({\text{weight}(C_{\text{out}_j}, k), \text{X}(N_i, k)})\]

where \(ccor\) is the cross-correlation, \(C_{in}\) is the channel number of the input, \(out_{j}\) corresponds to the \(j\)-th channel of the output and \(j\) is in the range of \([0, C_{out}-1]\). \(\text{weight}(C_{\text{out}_j}, k)\) is a convolution kernel slice with shape \((\text{kernel_size[0]}, \text{kernel_size[1]}, \text{kernel_size[2]})\), where \(\text{kernel_size[0]}\), \(\text{kernel_size[1]}\) and \(\text{kernel_size[2]}\) are the depth, height and width of the convolution kernel respectively. \(\text{bias}\) is the bias parameter and \(\text{X}\) is the input tensor. The shape of full convolution kernel is \((C_{out}, C_{in} / \text{group}, \text{kernel_size[0]}, \text{kernel_size[1]}, \text{kernel_size[2]})\), where group is the number of groups to split the input x in the channel dimension.

For more details, please refers to the paper Gradient Based Learning Applied to Document Recognition.

Note

On Ascend platform, only group convolution in depthwise convolution scenarios is supported. That is, when group>1, condition in_channels = out_channels = group must be satisfied.

Parameters:

in_channels (int) – The channel number of the input tensor of the Conv3d layer.
out_channels (int) – The channel number of the output tensor of the Conv3d layer.
kernel_size (Union[int, tuple[int]]) – Specifies the depth, height and width of the 3D convolution kernel. The data type is an integer or a tuple of three integers. An integer represents the depth, height and width of the convolution kernel. A tuple of three integers represents the depth, height and width of the convolution kernel respectively.
stride (Union[int, tuple[int]]) – The movement stride of the 3D convolution kernel. The data type is an integer or a tuple of three integers. An integer represents the movement step size in depth, height and width directions. A tuple of three integers represents the movement step size in the depth, height and width directions respectively. Default: 1.
pad_mode (str) –
Specifies padding mode. The optional values are “same”, “valid”, “pad”. Default: “same”.
- same: The width of the output is the same as the value of the input divided by stride. If this mode is set, the value of padding must be 0.
- valid: Returns a valid calculated output without padding. Excess pixels that do not satisfy the calculation will be discarded. If this mode is set, the value of padding must be 0.
- pad: Pads the input. Padding padding size of zero on both sides of the input. If this mode is set, the value of padding must be greater than or equal to 0.
padding (Union(int, tuple[int])) – The number of padding on the depth, height and width directions of the input. The data type is an integer or a tuple of six integers. If padding is an integer, then the head, tail, top, bottom, left, and right padding are all equal to padding. If padding is a tuple of six integers, then the head, tail, top, bottom, left, and right padding is equal to padding[0], padding[1], padding[2], padding[3], padding[4] and padding[5] respectively. The value should be greater than or equal to 0. Default: 0.
dilation (Union[int, tuple[int]]) – Dilation size of 3D convolution kernel. The data type is an integer or a tuple of three integers. If \(k > 1\), the kernel is sampled every k elements. The value of k on the depth, height and width directions is in range of [1, D], [1, H] and [1, W] respectively. Default: 1.
group (int) – Splits filter into groups, in_channels and out_channels must be divisible by group. Default: 1. Only 1 is currently supported.
has_bias (bool) – Whether the Conv3d layer has a bias parameter. Default: False.
weight_init (Union[Tensor, str, Initializer, numbers.Number]) – Initialization method of weight parameter. It can be a Tensor, a string, an Initializer or a numbers.Number. When a string is specified, values from ‘TruncatedNormal’, ‘Normal’, ‘Uniform’, ‘HeUniform’ and ‘XavierUniform’ distributions as well as constant ‘One’ and ‘Zero’ distributions are possible. Alias ‘xavier_uniform’, ‘he_uniform’, ‘ones’ and ‘zeros’ are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of Initializer for more details. Default: ‘normal’.
bias_init (Union[Tensor, str, Initializer, numbers.Number]) – Initialization method of bias parameter. Available initialization methods are the same as ‘weight_init’. Refer to the values of Initializer for more details. Default: ‘zeros’.
data_format (str) – The optional value for data format. Currently only support “NCDHW”.

Inputs:

x (Tensor) - Tensor of shape \((N, C_{in}, D_{in}, H_{in}, W_{in})\). Currently input data type only support float16 and float32.

Outputs:

Tensor of shape is \((N, C_{out}, D_{out}, H_{out}, W_{out})\).

pad_mode is ‘same’:

\[\begin{split}\begin{array}{ll} \\ D_{out} ＝ \left \lceil{\frac{D_{in}}{\text{stride[0]}}} \right \rceil \\ H_{out} ＝ \left \lceil{\frac{H_{in}}{\text{stride[1]}}} \right \rceil \\ W_{out} ＝ \left \lceil{\frac{W_{in}}{\text{stride[2]}}} \right \rceil \\ \end{array}\end{split}\]

pad_mode is ‘valid’:

\[\begin{split}\begin{array}{ll} \\ D_{out} ＝ \left \lfloor{\frac{D_{in} - \text{dilation[0]} \times (\text{kernel_size[0]} - 1) } {\text{stride[0]}} + 1} \right \rfloor \\ H_{out} ＝ \left \lfloor{\frac{H_{in} - \text{dilation[1]} \times (\text{kernel_size[1]} - 1) } {\text{stride[1]}} + 1} \right \rfloor \\ W_{out} ＝ \left \lfloor{\frac{W_{in} - \text{dilation[2]} \times (\text{kernel_size[2]} - 1) } {\text{stride[2]}} + 1} \right \rfloor \\ \end{array}\end{split}\]

pad_mode is ‘pad’:

\[\begin{split}\begin{array}{ll} \\ D_{out} ＝ \left \lfloor{\frac{D_{in} + padding[0] + padding[1] - (\text{dilation[0]} - 1) \times \text{kernel_size[0]} - 1 }{\text{stride[0]}} + 1} \right \rfloor \\ H_{out} ＝ \left \lfloor{\frac{H_{in} + padding[2] + padding[3] - (\text{dilation[1]} - 1) \times \text{kernel_size[1]} - 1 }{\text{stride[1]}} + 1} \right \rfloor \\ W_{out} ＝ \left \lfloor{\frac{W_{in} + padding[4] + padding[5] - (\text{dilation[2]} - 1) \times \text{kernel_size[2]} - 1 }{\text{stride[2]}} + 1} \right \rfloor \\ \end{array}\end{split}\]

Raises:

TypeError – If in_channels, out_channels or group is not an int.
TypeError – If kernel_size, stride, padding or dilation is neither an int nor a tuple.
ValueError – If out_channels, kernel_size, stride or dilation is less than 1.
ValueError – If padding is less than 0.
ValueError – If pad_mode is not one of ‘same’, ‘valid’, ‘pad’.
ValueError – If padding is a tuple whose length is not equal to 6.
ValueError – If pad_mode is not equal to ‘pad’ and padding is not equal to (0, 0, 0, 0, 0, 0).
ValueError – If data_format is not ‘NCDHW’.

Supported Platforms:: Ascend GPU CPU

Examples

>>> x = Tensor(np.ones([16, 3, 10, 32, 32]), mindspore.float32)
>>> conv3d = nn.Conv3d(in_channels=3, out_channels=32, kernel_size=(4, 3, 3))
>>> output = conv3d(x)
>>> print(output.shape)
(16, 32, 10, 32, 32)

class tinyms.layers.Conv3dTranspose(in_channels, out_channels, kernel_size, stride=1, pad_mode='same', padding=0, dilation=1, group=1, output_padding=0, has_bias=False, weight_init='normal', bias_init='zeros', data_format='NCDHW')[source]¶

Calculates a 3D transposed convolution, which can be regarded as Conv3d for the gradient of the input. It also called deconvolution (although it is not an actual deconvolution).

he input is typically of shape \((N, C_{in}, D_{in}, H_{in}, W_{in})\), where \(N\) is batch size, \(C_{in}\) is a number of channels, \(D_{in}, H_{in}, W_{in}\) are the depth, height and width of the feature layer respectively.

When Conv3d and Conv3dTranspose are initialized with the same parameters, and pad_mode is set to ‘pad’, \(dilation * (kernel\_size - 1) - padding\) amount of zero will be paded to the depth, height and width directions of the input, they are inverses of each other in regard to the input and output shapes in this case. However, when stride > 1, Conv2d maps multiple input shapes to the same output shape. Deconvolutional network can refer to Deconvolutional Networks.

Parameters:

in_channels (int) – The channel number of the input tensor of the Conv3dTranspose layer.
out_channels (int) – The channel number of the output tensor of the Conv3dTranspose layer.
kernel_size (Union[int, tuple[int]]) – Specifies the depth, height and width of the 3D convolution kernel. The data type is an integer or a tuple of three integers. An integer represents the depth, height and width of the convolution kernel. A tuple of three integers represents the depth, height and width of the convolution kernel respectively.
stride (Union[int, tuple[int]]) – The movement stride of the 3D convolution kernel. The data type is an integer or a tuple of three integers. An integer represents the movement step size in depth, height and width directions. A tuple of three integers represents the movement step size in the depth, height and width directions respectively. Default: 1.
pad_mode (str) –
Specifies padding mode. The optional values are “same”, “valid”, “pad”. Default: “same”.
- same: The width of the output is the same as the value of the input divided by stride. If this mode is set, the value of padding must be 0.
- valid: Returns a valid calculated output without padding. Excess pixels that do not satisfy the calculation will be discarded. If this mode is set, the value of padding must be 0.
- pad: Pads the input. Padding padding size of zero on both sides of the input. If this mode is set, the value of padding must be greater than or equal to 0.
padding (Union(int, tuple[int])) – The number of padding on the depth, height and width directions of the input. The data type is an integer or a tuple of six integers. If padding is an integer, then the head, tail, top, bottom, left, and right padding are all equal to padding. If padding is a tuple of six integers, then the head, tail, top, bottom, left, and right padding is equal to padding[0], padding[1], padding[2], padding[3], padding[4] and padding[5] respectively. The value should be greater than or equal to 0. Default: 0.
dilation (Union[int, tuple[int]]) – Dilation size of 3D convolution kernel. The data type is an integer or a tuple of three integers. If \(k > 1\), the kernel is sampled every k elements. The value of k on the depth, height and width directions is in range of [1, D], [1, H] and [1, W] respectively. Default: 1.
group (int) – Splits filter into groups, in_channels and out_channels must be divisible by group. Default: 1. Only 1 is currently supported.
output_padding (Union(int, tuple[int])) – The number of padding on the depth, height and width directions of the output. The data type is an integer or a tuple of six integers. If output_padding is an integer, then the head, tail, top, bottom, left, and right padding are all equal to output_padding. If output_padding is a tuple of six integers, then the head, tail, top, bottom, left, and right padding is equal to output_padding[0], output_padding[1], output_padding[2], output_padding[3], output_padding[4] and output_padding[5] respectively. The value should be greater than or equal to 0. Default: 0.
has_bias (bool) – Whether the Conv3dTranspose layer has a bias parameter. Default: False.
weight_init (Union[Tensor, str, Initializer, numbers.Number]) – Initialization method of weight parameter. It can be a Tensor, a string, an Initializer or a numbers.Number. When a string is specified, values from ‘TruncatedNormal’, ‘Normal’, ‘Uniform’, ‘HeUniform’ and ‘XavierUniform’ distributions as well as constant ‘One’ and ‘Zero’ distributions are possible. Alias ‘xavier_uniform’, ‘he_uniform’, ‘ones’ and ‘zeros’ are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of Initializer for more details. Default: ‘normal’.
bias_init (Union[Tensor, str, Initializer, numbers.Number]) – Initialization method of bias parameter. Available initialization methods are the same as ‘weight_init’. Refer to the values of Initializer for more details. Default: ‘zeros’.
data_format (str) – The optional value for data format. Currently only support ‘NCDHW’.

Inputs:

x (Tensor) - Tensor of shape \((N, C_{in}, D_{in}, H_{in}, W_{in})\). Currently input data type only support float16 and float32.

Outputs:

Tensor, the shape is \((N, C_{out}, D_{out}, H_{out}, W_{out})\).

pad_mode is ‘same’:

\[\begin{split}\begin{array}{ll} \\ D_{out} ＝ \left \lfloor{\frac{D_{in}}{\text{stride[0]}} + 1} \right \rfloor \\ H_{out} ＝ \left \lfloor{\frac{H_{in}}{\text{stride[1]}} + 1} \right \rfloor \\ W_{out} ＝ \left \lfloor{\frac{W_{in}}{\text{stride[2]}} + 1} \right \rfloor \\ \end{array}\end{split}\]

pad_mode is ‘valid’:

\[\begin{split}\begin{array}{ll} \\ D_{out} ＝ \left \lfloor{\frac{D_{in} - \text{dilation[0]} \times (\text{kernel_size[0]} - 1) } {\text{stride[0]}} + 1} \right \rfloor \\ H_{out} ＝ \left \lfloor{\frac{H_{in} - \text{dilation[1]} \times (\text{kernel_size[1]} - 1) } {\text{stride[1]}} + 1} \right \rfloor \\ W_{out} ＝ \left \lfloor{\frac{W_{in} - \text{dilation[2]} \times (\text{kernel_size[2]} - 1) } {\text{stride[2]}} + 1} \right \rfloor \\ \end{array}\end{split}\]

pad_mode is ‘pad’:

\[\begin{split}\begin{array}{ll} \\ D_{out} ＝ \left \lfloor{\frac{D_{in} + padding[0] + padding[1] - (\text{dilation[0]} - 1) \times \text{kernel_size[0]} - 1 }{\text{stride[0]}} + 1} \right \rfloor \\ H_{out} ＝ \left \lfloor{\frac{H_{in} + padding[2] + padding[3] - (\text{dilation[1]} - 1) \times \text{kernel_size[1]} - 1 }{\text{stride[1]}} + 1} \right \rfloor \\ W_{out} ＝ \left \lfloor{\frac{W_{in} + padding[4] + padding[5] - (\text{dilation[2]} - 1) \times \text{kernel_size[2]} - 1 }{\text{stride[2]}} + 1} \right \rfloor \\ \end{array}\end{split}\]

Raises:

TypeError – If in_channels, out_channels or group is not an int.
TypeError – If kernel_size, stride, padding , dilation or output_padding is neither an int not a tuple of three.
TypeError – If input data type is not float16 or float32.
ValueError – If in_channels, out_channels, kernel_size, stride or dilation is less than 1.
ValueError – If padding is less than 0.
ValueError – If pad_mode is not one of ‘same’, ‘valid’, ‘pad’.
ValueError – If padding is a tuple whose length is not equal to 6.
ValueError – If pad_mode is not equal to ‘pad’ and padding is not equal to (0, 0, 0, 0, 0, 0).
ValueError – If data_format is not ‘NCDHW’.

Supported Platforms:: Ascend GPU CPU

Examples

>>> x = Tensor(np.ones([32, 16, 10, 32, 32]), mindspore.float32)
>>> conv3d_transpose = nn.Conv3dTranspose(in_channels=16, out_channels=3, kernel_size=(4, 6, 2),
...                                       pad_mode='pad')
>>> output = conv3d_transpose(x)
>>> print(output.shape)
(32, 3, 13, 37, 33)

class tinyms.layers.BiDense(in1_channels, in2_channels, out_channels, weight_init=None, bias_init=None, has_bias=True)[source]¶

The bilinear dense connected layer.

Applies dense connected layer for two inputs. This layer implements the operation as:

\[y = x_1^T A x_2 + b,\]

where \(x_{1}\) is the first input tensor, \(x_{2}\) is the second input tensor , \(A\) is a weight matrix with the same data type as the \(x_{*}\) created by the layer , and \(b\) is a bias vector with the same data type as the \(x_{*}\) created by the layer (only if has_bias is True).

Parameters:

in1_channels (int) – The number of channels in the input1 space.
in2_channels (int) – The number of channels in the input2 space.
out_channels (int) – The number of channels in the output space.
weight_init (Union[Tensor, str, Initializer, numbers.Number]) – The trainable weight_init parameter. The values of str refer to the function initializer. Default: None.
bias_init (Union[Tensor, str, Initializer, numbers.Number]) – The trainable bias_init parameter. The values of str refer to the function initializer. Default: None.
has_bias (bool) – Specifies whether the layer uses \(\text{bias}\) vector. Default: True.

Shape:

input1 - \((*, H_{in1})\) where \(H_{in1}=\text{in1_channels}\) and \(*\) means any number of additional dimensions including none. All but the last dimension of the inputs should be the same.
input2 - \((*, H_{in2})\) where \(H_{in2}=\text{in2_channels}\) and \(*\) means any number of additional dimensions including none. All but the last dimension of the inputs should be the same.
output - \((*, H_{out})\) where \(H_{out}=\text{out_channels}\) and \(*\) means any number of additional dimensions including none. All but the last dimension are the same shape as the inputs.

Dtype:

input1 (Tensor) - The dtype must be float16 or float32 and be same as input2.
input1 (Tensor) - The dtype must be float16 or float32 and be same as input1.
output (Tensor) - With the same dtype as the inputs.

Weights:

weight (Parameter) - The learnable weights with shape \((\text{out_channels}, \text{in1_channels}, \text{in2_channels})\). When weight_init is None, the values are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\), where \(k = \frac{1}{\text{in1_channels}}\).
bias (Parameter) - The learnable bias of shape \((\text{out_channels})\). If has_bias is True and bias_init is None, the values are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\), where \(k = \frac{1}{\text{in1_channels}}\).

Raises:

TypeError – If in1_channels, in2_channels or out_channels is not an int.
TypeError – If has_bias is not a bool.
ValueError – If length of shape of weight_init is not equal to 3 or shape[0] of weight_init is not equal to out_channels or shape[1] of weight_init is not equal to in1_channels or shape[2] of weight_init is not equal to in2_channels.
ValueError – If length of shape of bias_init is not equal to 1 or shape[0] of bias_init is not equal to out_channels.

Supported Platforms:: Ascend GPU CPU

Examples

>>> x1 = Tensor(np.random.randn(128, 20), mindspore.float32)
>>> x2 = Tensor(np.random.randn(128, 30), mindspore.float32)
>>> net = nn.BiDense(20, 30, 40)
>>> output = net(x1, x2)
>>> print(output.shape)
(128, 40)

class tinyms.layers.LSTMCell(**kwargs)[source]¶

A LSTM (Long Short-Term Memory) cell.

\[\begin{split}\begin{array}{ll} \\ i_t = \sigma(W_{ix} x_t + b_{ix} + W_{ih} h_{(t-1)} + b_{ih}) \\ f_t = \sigma(W_{fx} x_t + b_{fx} + W_{fh} h_{(t-1)} + b_{fh}) \\ \tilde{c}_t = \tanh(W_{cx} x_t + b_{cx} + W_{ch} h_{(t-1)} + b_{ch}) \\ o_t = \sigma(W_{ox} x_t + b_{ox} + W_{oh} h_{(t-1)} + b_{oh}) \\ c_t = f_t * c_{(t-1)} + i_t * \tilde{c}_t \\ h_t = o_t * \tanh(c_t) \\ \end{array}\end{split}\]

Here \(\sigma\) is the sigmoid function, and \(*\) is the Hadamard product. \(W, b\) are learnable weights between the output and the input in the formula. For instance, \(W_{ix}, b_{ix}\) are the weight and bias used to transform from input \(x\) to \(i\). Details can be found in paper LONG SHORT-TERM MEMORY and Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling.

The encapsulated LSTMCell can be simplified to the following formula:

\[h^{'},c^{'} = LSTMCell(x, (h_0, c_0))\]

Parameters:

input_size (int) – Number of features of input.
hidden_size (int) – Number of features of hidden layer.
has_bias (bool) – Whether the cell has bias b_ih and b_hh. Default: True.

Inputs:

x (Tensor) - Tensor of shape \((batch\_size, input\_size)\).
hx (tuple) - A tuple of two Tensors (h_0, c_0) both of data type mindspore.float32 and shape \((batch\_size, hidden\_size)\). The data type of hx must be the same as x.

Outputs:

hx’ (Tensor) - A tuple of two Tensors (h’, c’) both of data shape \((batch\_size, hidden\_size)\).

Raises:

TypeError – If input_size, hidden_size is not an int.
TypeError – If has_bias is not a bool.

Supported Platforms:: Ascend GPU CPU

Examples

>>> net = nn.LSTMCell(10, 16)
>>> x = Tensor(np.ones([5, 3, 10]).astype(np.float32))
>>> h = Tensor(np.ones([3, 16]).astype(np.float32))
>>> c = Tensor(np.ones([3, 16]).astype(np.float32))
>>> output = []
>>> for i in range(5):
...     hx = net(x[i], (h, c))
...     output.append(hx)
>>> print(output[0][0].shape)
(3, 16)

class tinyms.layers.GRUCell(input_size: int, hidden_size: int, has_bias: bool = True)[source]¶

A GRU(Gated Recurrent Unit) cell.

\[\begin{split}\begin{array}{ll} r = \sigma(W_{ir} x + b_{ir} + W_{hr} h + b_{hr}) \\ z = \sigma(W_{iz} x + b_{iz} + W_{hz} h + b_{hz}) \\ n = \tanh(W_{in} x + b_{in} + r * (W_{hn} h + b_{hn})) \\ h' = (1 - z) * n + z * h \end{array}\end{split}\]

Here \(\sigma\) is the sigmoid function, and \(*\) is the Hadamard product. \(W, b\) are learnable weights between the output and the input in the formula. For instance, \(W_{ir}, b_{ir}\) are the weight and bias used to transform from input \(x\) to \(r\). Details can be found in paper Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation.

Parameters:

input_size (int) – Number of features of input.
hidden_size (int) – Number of features of hidden layer.
has_bias (bool) – Whether the cell has bias b_in and b_hn. Default: True.

Inputs:

x (Tensor) - Tensor of shape \((batch\_size, input\_size)\).
hx (Tensor) - Tensor of data type mindspore.float32 and shape \((batch\_size, hidden\_size)\). Data type of hx must be the same as x.

Outputs:

hx’ (Tensor) - Tensor of shape \((batch\_size, hidden\_size)\).

Raises:

TypeError – If input_size, hidden_size is not an int.
TypeError – If has_bias is not a bool.

Supported Platforms:: Ascend GPU CPU

Examples

>>> net = nn.GRUCell(10, 16)
>>> x = Tensor(np.ones([5, 3, 10]).astype(np.float32))
>>> hx = Tensor(np.ones([3, 16]).astype(np.float32))
>>> output = []
>>> for i in range(5):
...     hx = net(x[i], hx)
...     output.append(hx)
>>> print(output[0].shape)
(3, 16)

class tinyms.layers.RNNCell(input_size: int, hidden_size: int, has_bias: bool = True, nonlinearity: str = 'tanh')[source]¶

An Elman RNN cell with tanh or ReLU non-linearity.

\[h_t = \tanh(W_{ih} x_t + b_{ih} + W_{hh} h_{(t-1)} + b_{hh})\]

Here \(h_t\) is the hidden state at time t, \(x_t\) is the input at time t, and \(h_{(t-1)}\) is the hidden state of the previous layer at time \(t-1\) or the initial hidden state at time 0. If nonlinearity is relu, then relu is used instead of tanh.

Parameters:

input_size (int) – Number of features of input.
hidden_size (int) – Number of features of hidden layer.
has_bias (bool) – Whether the cell has bias b_ih and b_hh. Default: True.
nonlinearity (str) – The non-linearity to use. Can be either tanh or relu. Default: tanh.

Inputs:

x (Tensor) - Tensor of shape \((batch\_size, input\_size)\) .
hx (Tensor) - Tensor of data type mindspore.float32 and shape \((batch\_size, hidden\_size)\) . Data type of hx must be the same as x.

Outputs:

hx’ (Tensor) - Tensor of shape \((batch\_size, hidden\_size)\) .

Raises:

TypeError – If input_size or hidden_size is not an int or not greater than 0.
TypeError – If has_bias is not a bool.
ValueError – If nonlinearity is not in [‘tanh’, ‘relu’].

Supported Platforms:: Ascend GPU CPU

Examples

>>> net = nn.RNNCell(10, 16)
>>> x = Tensor(np.ones([5, 3, 10]).astype(np.float32))
>>> hx = Tensor(np.ones([3, 16]).astype(np.float32))
>>> output = []
>>> for i in range(5):
...     hx = net(x[i], hx)
...     output.append(hx)
>>> print(output[0].shape)
(3, 16)

class tinyms.layers.LSTM(*args, **kwargs)[source]¶

Stacked LSTM (Long Short-Term Memory) layers.

Apply LSTM layer to the input.

There are two pipelines connecting two consecutive cells in a LSTM model; one is cell state pipeline and the other is hidden state pipeline. Denote two consecutive time nodes as \(t-1\) and \(t\). Given an input \(x_t\) at time \(t\), an hidden state \(h_{t-1}\) and an cell state \(c_{t-1}\) of the layer at time \({t-1}\), the cell state and hidden state at time \(t\) is computed using an gating mechanism. Input gate \(i_t\) is designed to protect the cell from perturbation by irrelevant inputs. Forget gate \(f_t\) affords protection of the cell by forgetting some information in the past, which is stored in \(h_{t-1}\). Output gate \(o_t\) protects other units from perturbation by currently irrelevant memory contents. Candidate cell state \(\tilde{c}_t\) is calculated with the current input, on which the input gate will be applied. Finally, current cell state \(c_{t}\) and hidden state \(h_{t}\) are computed with the calculated gates and cell states. The complete formulation is as follows.

\[\begin{split}\begin{array}{ll} \\ i_t = \sigma(W_{ix} x_t + b_{ix} + W_{ih} h_{(t-1)} + b_{ih}) \\ f_t = \sigma(W_{fx} x_t + b_{fx} + W_{fh} h_{(t-1)} + b_{fh}) \\ \tilde{c}_t = \tanh(W_{cx} x_t + b_{cx} + W_{ch} h_{(t-1)} + b_{ch}) \\ o_t = \sigma(W_{ox} x_t + b_{ox} + W_{oh} h_{(t-1)} + b_{oh}) \\ c_t = f_t * c_{(t-1)} + i_t * \tilde{c}_t \\ h_t = o_t * \tanh(c_t) \\ \end{array}\end{split}\]

Here \(\sigma\) is the sigmoid function, and \(*\) is the Hadamard product. \(W, b\) are learnable weights between the output and the input in the formula. For instance, \(W_{ix}, b_{ix}\) are the weight and bias used to transform from input \(x\) to \(i\). Details can be found in paper LONG SHORT-TERM MEMORY and Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling.

LSTM hides the cycle of the whole cyclic neural network on the time step of the sequence, and input the sequence and initial state to obtain the matrix spliced by the hidden state of each time step and the hidden state of the last time step. We use the hidden state of the last time step as the coding feature of the input sentence and output it to the next layer.

\[h_{0:n},(h_{n}, c_{n}) = LSTM(x_{0:n},(h_{0},c_{0}))\]

Parameters:

input_size (int) – Number of features of input.
hidden_size (int) – Number of features of hidden layer.
num_layers (int) – Number of layers of stacked LSTM . Default: 1.
has_bias (bool) – Whether the cell has bias b_ih and b_hh. Default: True.
batch_first (bool) – Specifies whether the first dimension of input x is batch_size. Default: False.
dropout (float, int) – If not 0, append Dropout layer on the outputs of each LSTM layer except the last layer. Default 0. The range of dropout is [0.0, 1.0).
bidirectional (bool) – Specifies whether it is a bidirectional LSTM, num_directions=2 if bidirectional=True otherwise 1. Default: False.

Inputs:

x (Tensor) - Tensor of data type mindspore.float32 or mindspore.float16 and shape \((seq\_len, batch\_size, input\_size)\) or \((batch\_size, seq\_len, input\_size)\).
hx (tuple) - A tuple of two Tensors (h_0, c_0) both of data type mindspore.float32 or mindspore.float16 and shape \((num\_directions * num\_layers, batch\_size, hidden\_size)\). The data type of hx must be the same as x.
seq_length (Tensor) - The length of each sequence in an input batch. Tensor of shape \((batch\_size)\). Default: None. This input indicates the real sequence length before padding to avoid padded elements have been used to compute hidden state and affect the final output. It is recommended to use this input when x has padding elements.

Outputs:

Tuple, a tuple contains (output, (h_n, c_n)).

output (Tensor) - Tensor of shape \((seq\_len, batch\_size, num\_directions * hidden\_size)\) .
hx_n (tuple) - A tuple of two Tensor (h_n, c_n) both of shape \((num\_directions * num\_layers, batch\_size, hidden\_size)\) .

Raises:

TypeError – If input_size, hidden_size or num_layers is not an int.
TypeError – If has_bias, batch_first or bidirectional is not a bool.
TypeError – If dropout is not a float.
ValueError – If dropout is not in range [0.0, 1.0).

Supported Platforms:: Ascend GPU CPU

Examples

>>> net = nn.LSTM(10, 16, 2, has_bias=True, batch_first=True, bidirectional=False)
>>> x = Tensor(np.ones([3, 5, 10]).astype(np.float32))
>>> h0 = Tensor(np.ones([1 * 2, 3, 16]).astype(np.float32))
>>> c0 = Tensor(np.ones([1 * 2, 3, 16]).astype(np.float32))
>>> output, (hn, cn) = net(x, (h0, c0))
>>> print(output.shape)
(3, 5, 16)

class tinyms.layers.GRU(*args, **kwargs)[source]¶

Stacked GRU (Gated Recurrent Unit) layers.

Apply GRU layer to the input.

There are two gates in a GRU model. One is update gate and the other is reset gate. Denote two consecutive time nodes as \(t-1\) and \(t\). Given an input \(x_t\) at time \(t\), a hidden state \(h_{t-1}\), the update and reset gate at time \(t\) is computed using a gating mechanism. Update gate \(z_t\) is designed to protect the cell from perturbation by irrelevant inputs and past hidden state. Reset gate \(r_t\) determines how much information should be reset from old hidden state. New memory state \(n_t\) is calculated with the current input, on which the reset gate will be applied. Finally, current hidden state \(h_{t}\) is computed with the calculated update grate and new memory state. The complete formulation is as follows:

\[\begin{split}\begin{array}{ll} r_t = \sigma(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\ z_t = \sigma(W_{iz} x_t + b_{iz} + W_{hz} h_{(t-1)} + b_{hz}) \\ n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t-1)}+ b_{hn})) \\ h_t = (1 - z_t) * n_t + z_t * h_{(t-1)} \end{array}\end{split}\]

Here \(\sigma\) is the sigmoid function, and \(*\) is the Hadamard product. \(W, b\) are learnable weights between the output and the input in the formula. For instance, \(W_{ir}, b_{ir}\) are the weight and bias used to transform from input \(x\) to \(r\). Details can be found in paper Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation.

Note

When using GRU on Ascend, the hidden size only supports multiples of 16.

Parameters:

input_size (int) – Number of features of input.
hidden_size (int) – Number of features of hidden layer.
num_layers (int) – Number of layers of stacked GRU. Default: 1.
has_bias (bool) – Whether the cell has bias b_in and b_hn. Default: True.
batch_first (bool) – Specifies whether the first dimension of input x is batch_size. Default: False.
dropout (float) – If not 0.0, append Dropout layer on the outputs of each GRU layer except the last layer. Default 0.0. The range of dropout is [0.0, 1.0).
bidirectional (bool) – Specifies whether it is a bidirectional GRU, num_directions=2 if bidirectional=True otherwise 1. Default: False.

Inputs:

x (Tensor) - Tensor of data type mindspore.float32 or mindspore.float16 and shape (seq_len, batch_size, input_size) or (batch_size, seq_len, input_size).
hx (Tensor) - Tensor of data type mindspore.float32 or mindspore.float16 and shape (num_directions * num_layers, batch_size, hidden_size). The data type of hx must be the same as x.
seq_length (Tensor) - The length of each sequence in an input batch. Tensor of shape \((\text{batch_size})\). Default: None. This input indicates the real sequence length before padding to avoid padded elements have been used to compute hidden state and affect the final output. It is recommended to use this input when x has padding elements.

Outputs:

Tuple, a tuple contains (output, h_n).

output (Tensor) - Tensor of shape (seq_len, batch_size, num_directions * hidden_size) or (batch_size, seq_len, num_directions * hidden_size).
hx_n (Tensor) - Tensor of shape (num_directions * num_layers, batch_size, hidden_size).

Raises:

TypeError – If input_size, hidden_size or num_layers is not an int.
TypeError – If has_bias, batch_first or bidirectional is not a bool.
TypeError – If dropout is not a float.
ValueError – If dropout is not in range [0.0, 1.0).

Supported Platforms:: Ascend GPU CPU

Examples

>>> net = nn.GRU(10, 16, 2, has_bias=True, batch_first=True, bidirectional=False)
>>> x = Tensor(np.ones([3, 5, 10]).astype(np.float32))
>>> h0 = Tensor(np.ones([1 * 2, 3, 16]).astype(np.float32))
>>> output, hn = net(x, h0)
>>> print(output.shape)
(3, 5, 16)

class tinyms.layers.RNN(*args, **kwargs)[source]¶

Stacked Elman RNN layers.

Apply RNN layer with \(\tanh\) or \(\text{ReLU}\) non-linearity to the input.

For each element in the input sequence, each layer computes the following function:

\[h_t = activation(W_{ih} x_t + b_{ih} + W_{hh} h_{(t-1)} + b_{hh})\]

Here \(h_t\) is the hidden state at time t, \(x_t\) is the input at time t, and \(h_{(t-1)}\) is the hidden state of the previous layer at time \(t-1\) or the initial hidden state at time 0. If nonlinearity is 'relu', then \(\text{ReLU}\) is used instead of \(\tanh\).

Parameters:

input_size (int) – Number of features of input.
hidden_size (int) – Number of features of hidden layer.
num_layers (int) – Number of layers of stacked RNN. Default: 1.
nonlinearity (str) – The non-linearity to use. Can be either 'tanh' or 'relu'. Default: 'tanh'
has_bias (bool) – Whether the cell has bias b_ih and b_hh. Default: True.
batch_first (bool) – Specifies whether the first dimension of input x is batch_size. Default: False.
dropout (float) – If not 0.0, append Dropout layer on the outputs of each RNN layer except the last layer. Default 0.0. The range of dropout is [0.0, 1.0).
bidirectional (bool) – Specifies whether it is a bidirectional RNN, num_directions=2 if bidirectional=True otherwise 1. Default: False.

Inputs:

x (Tensor) - Tensor of data type mindspore.float32 or mindspore.float16 and shape \((seq\_len, batch\_size, input\_size)\) or \((batch\_size, seq\_len, input\_size)\) .
hx (Tensor) - Tensor of data type mindspore.float32 or mindspore.float16 and shape \((num\_directions * num\_layers, batch\_size, hidden\_size)\) . The data type of hx must be the same as x.
seq_length (Tensor) - The length of each sequence in an input batch. Tensor of shape \((batch\_size)\) . Default: None. This input indicates the real sequence length before padding to avoid padded elements have been used to compute hidden state and affect the final output. It is recommended to use this input when x has padding elements.

Outputs:

Tuple, a tuple contains (output, hx_n).

output (Tensor) - Tensor of shape \((seq\_len, batch\_size, num\_directions * hidden\_size)\) or \((batch\_size, seq\_len, num\_directions * hidden\_size)\) .
hx_n (Tensor) - Tensor of shape \((num\_directions * num\_layers, batch\_size, hidden\_size)\) .

Raises:

TypeError – If input_size, hidden_size or num_layers is not an int.
TypeError – If has_bias, batch_first or bidirectional is not a bool.
TypeError – If dropout is not a float.
ValueError – If dropout is not in range [0.0, 1.0).
ValueError – If nonlinearity is not in [‘tanh’, ‘relu’].

Supported Platforms:: Ascend GPU CPU

Examples

>>> net = nn.RNN(10, 16, 2, has_bias=True, batch_first=True, bidirectional=False)
>>> x = Tensor(np.ones([3, 5, 10]).astype(np.float32))
>>> h0 = Tensor(np.ones([1 * 2, 3, 16]).astype(np.float32))
>>> output, hn = net(x, h0)
>>> print(output.shape)
(3, 5, 16)

class tinyms.layers.Dropout(keep_prob=0.5, p=None, dtype=mindspore.float32)[source]¶

Dropout layer for the input.

Dropout is a regularization method. The operator randomly sets some neurons output to 0 according to the probability of discarding the probability of discarding. During the reasoning, this layer returns the same Tensor as the x.

This technique is proposed in paper Dropout: A Simple Way to Prevent Neural Networks from Overfitting and proved to be effective to reduce over-fitting and prevents neurons from co-adaptation. See more details in Improving neural networks by preventing co-adaptation of feature detectors.

Note

Each channel will be zeroed out independently on every construct call.
Parameter keep_prob will be removed in a future version, please use parameter p instead. Parameter p means the probability of the element of the input tensor to be zeroed.
Parameter dtype will be removed in a future version. It is not recommended to define this parameter.

Parameters:

keep_prob (float) – Deprecated. The keep rate, greater than 0 and less equal than 1. E.g. rate=0.9, dropping out 10% of input neurons. Default: 0.5.
p (Union[float, int, None]) – The dropout rate, greater than or equal to 0 and less than 1. E.g. rate=0.9, dropping out 90% of input neurons. Default: None.
dtype (mindspore.dtype) – Data type of input. Default: mindspore.float32.

Inputs:

x (Tensor) - The input of Dropout with data type of float16 or float32.

Outputs:

Tensor, output tensor with the same shape as the x.

Raises:

TypeError – If keep_prob is not a float.
TypeError – If the dtype of p is not float or int.
TypeError – If dtype of x is not neither float16 nor float32.
ValueError – If keep_prob is not in range (0, 1].
ValueError – If p is not in range [0, 1).
ValueError – If length of shape of x is less than 1.

Supported Platforms:: Ascend GPU CPU

Examples

>>> x = Tensor(np.ones([2, 2, 3]), mindspore.float32)
>>> net = nn.Dropout(p=0.2)
>>> net.set_train()
>>> output = net(x)
>>> print(output.shape)
(2, 2, 3)

class tinyms.layers.Flatten(start_dim=1, end_dim=-1)[source]¶

Flatten the input Tensor along dimensions from start_dim to end_dim.

Parameters:

start_dim (int, optional) – The first dimension to flatten. Default: 1.
end_dim (int, optional) – The last dimension to flatten. Default: -1.

Inputs:

x (Tensor) - The input Tensor to be flattened.

Outputs:

Tensor. If no dimensions are flattened, returns the original x, otherwise return the flattened Tensor. If x is a 0-dimensional Tensor, a 1-dimensional Tensor will be returned.

Raises:

TypeError – If x is not a Tensor.
TypeError – If start_dim or end_dim is not int.
ValueError – If start_dim is greater than end_dim after canonicalized.
ValueError – If start_dim or end_dim is not in range of [-x.dim, x.dim-1].

Supported Platforms:: Ascend GPU CPU

Examples

>>> x = Tensor(np.array([[[1.2, 1.2], [2.1, 2.1]], [[2.2, 2.2], [3.2, 3.2]]]), mindspore.float32)
>>> net = nn.Flatten()
>>> output = net(x)
>>> print(output)
[[1.2 1.2 2.1 2.1]
 [2.2 2.2 3.2 3.2]]
>>> print(f"before flatten the x shape is {x.shape}")
before flatten the x shape is  (2, 2, 2)
>>> print(f"after flatten the output shape is {output.shape}")
after flatten the output shape is (2, 4)

class tinyms.layers.Dense(in_channels, out_channels, weight_init='normal', bias_init='zeros', has_bias=True, activation=None)[source]¶

The dense connected layer.

Applies dense connected layer for the input. This layer implements the operation as:

\[\text{outputs} = \text{activation}(\text{X} * \text{kernel} + \text{bias}),\]

where \(X\) is the input tensors, \(\text{activation}\) is the activation function passed as the activation argument (if passed in), \(\text{kernel}\) is a weight matrix with the same data type as the \(X\) created by the layer, and \(\text{bias}\) is a bias vector with the same data type as the \(X\) created by the layer (only if has_bias is True).

Parameters:

in_channels (int) – The number of channels in the input space.
out_channels (int) – The number of channels in the output space.
weight_init (Union[Tensor, str, Initializer, numbers.Number]) – The trainable weight_init parameter. The dtype is same as x. The values of str refer to the function initializer. Default: ‘normal’.
bias_init (Union[Tensor, str, Initializer, numbers.Number]) – The trainable bias_init parameter. The dtype is same as x. The values of str refer to the function initializer. Default: ‘zeros’.
has_bias (bool) – Specifies whether the layer uses a bias vector \(\text{bias}\). Default: True.
activation (Union[str, Cell, Primitive, None]) – activate function applied to the output of the fully connected layer. Both activation name, e.g. ‘relu’, and mindspore activation function, e.g. mindspore.ops.ReLU(), are supported. Default: None.

Inputs:

x (Tensor) - Tensor of shape \((*, in\_channels)\). The in_channels in Args should be equal to \(in\_channels\) in Inputs.

Outputs:

Tensor of shape \((*, out\_channels)\).

Raises:

TypeError – If in_channels or out_channels is not an int.
TypeError – If has_bias is not a bool.
TypeError – If activation is not one of str, Cell, Primitive, None.
ValueError – If length of shape of weight_init is not equal to 2 or shape[0] of weight_init is not equal to out_channels or shape[1] of weight_init is not equal to in_channels.
ValueError – If length of shape of bias_init is not equal to 1 or shape[0] of bias_init is not equal to out_channels.

Supported Platforms:: Ascend GPU CPU

Examples

>>> x = Tensor(np.array([[180, 234, 154], [244, 48, 247]]), mindspore.float32)
>>> net = nn.Dense(3, 4)
>>> output = net(x)
>>> print(output.shape)
(2, 4)

class tinyms.layers.ClipByNorm(axis=None)[source]¶

Clips tensor values to a maximum \(L_2\)-norm.

The output of this layer remains the same if the \(L_2\)-norm of the input tensor is not greater than the argument clip_norm. Otherwise the tensor will be normalized as:

\[\text{output}(X) = \frac{\text{clip_norm} * X}{L_2(X)},\]

where \(L_2(X)\) is the \(L_2\)-norm of \(X\).

Parameters:: axis (Union[None, int, tuple(int)]) – Compute the L2-norm along the Specific dimension. Default: None, all dimensions to calculate.

Inputs:

x (Tensor) - Tensor of shape N-D. The type must be float32 or float16.
clip_norm (Tensor) - A scalar Tensor of shape \(()\) or \((1)\). Or a tensor shape can be broadcast to input x shape.

Outputs:

Tensor, clipped tensor with the same shape as the x, whose type is float32.

Raises:

TypeError – If axis is not one of None, int, tuple.
TypeError – If dtype of x is neither float32 nor float16.

Supported Platforms:: Ascend GPU CPU

Examples

>>> net = nn.ClipByNorm()
>>> x = Tensor(np.random.randint(0, 10, [4, 16]), mindspore.float32)
>>> clip_norm = Tensor(np.array([100]).astype(np.float32))
>>> output = net(x, clip_norm)
>>> print(output.shape)
(4, 16)

class tinyms.layers.Norm(axis=(), keep_dims=False)[source]¶: ‘nn.Norm’ is deprecated from version 2.0 and will be removed in a future version, use ‘ops.norm’ instead.

class tinyms.layers.OneHot(axis=-1, depth=1, on_value=1.0, off_value=0.0, dtype=mindspore.float32)[source]¶: ‘nn.OneHot’ is deprecated from version 2.0 and will be removed in a future version, use ‘ops.one_hot’ instead.

class tinyms.layers.Pad(paddings, mode='CONSTANT')[source]¶

Pads the input tensor according to the paddings and mode.

Parameters:

paddings (tuple) –
The shape of parameter paddings is \((N, 2)\) . N is the rank of input data. All elements of paddings are int type. For D th dimension of the x, paddings[D, 0] indicates how many sizes to be extended ahead of the D th dimension of the input tensor, and paddings[D, 1] indicates how many sizes to be extended behind of the D th dimension of the input tensor. The padded size of each dimension D of the output is: \(paddings[D, 0] + input\_x.dim\_size(D) + paddings[D, 1]\), e.g.:
```
mode = "CONSTANT".
paddings = [[1,1], [2,2]].
x = [[1,2,3], [4,5,6], [7,8,9]].
# The above can be seen: 1st dimension of `x` is 3, 2nd dimension of `x` is 3.
# Substitute into the formula to get:
# 1st dimension of output is paddings[0][0] + 3 + paddings[0][1] = 1 + 3 + 1 = 5.
# 2nd dimension of output is paddings[1][0] + 3 + paddings[1][1] = 2 + 3 + 2 = 7.
# So the shape of output is (5, 7).
```
mode (str) – Specifies padding mode. The optional values are “CONSTANT”, “REFLECT”, “SYMMETRIC”. Default: “CONSTANT”.

Inputs:

x (Tensor) - The input tensor.

Outputs:

Tensor, the tensor after padding.

If mode is “CONSTANT”, it fills the edge with 0, regardless of the values of the x. If the x is [[1,2,3], [4,5,6], [7,8,9]] and paddings is [[1,1], [2,2]], then the Outputs is [[0,0,0,0,0,0,0], [0,0,1,2,3,0,0], [0,0,4,5,6,0,0], [0,0,7,8,9,0,0], [0,0,0,0,0,0,0]].
If mode is “REFLECT”, it uses a way of symmetrical copying through the axis of symmetry to fill in. If the x is [[1,2,3], [4,5,6], [7,8,9]] and paddings is [[1,1], [2,2]], then the Outputs is [[6,5,4,5,6,5,4], [3,2,1,2,3,2,1], [6,5,4,5,6,5,4], [9,8,7,8,9,8,7], [6,5,4,5,6,5,4]].
If mode is “SYMMETRIC”, the filling method is similar to the “REFLECT”. It is also copied according to the symmetry axis, except that it includes the symmetry axis. If the x is [[1,2,3], [4,5,6], [7,8,9]] and paddings is [[1,1], [2,2]], then the Outputs is [[2,1,1,2,3,3,2], [2,1,1,2,3,3,2], [5,4,4,5,6,6,5], [8,7,7,8,9,9,8], [8,7,7,8,9,9,8]].

Raises:

TypeError – If paddings is not a tuple.
ValueError – If length of paddings is more than 4 or its shape is not \((N, 2)\) .
ValueError – If mode is not one of ‘CONSTANT’, ‘REFLECT’, ‘SYMMETRIC’.

Supported Platforms:: Ascend GPU CPU

Examples

>>> from mindspore import Tensor
>>> import mindspore.nn as nn
>>> import numpy as np
>>> # If `mode` is "CONSTANT"
>>> class Net(nn.Cell):
...     def __init__(self):
...         super(Net, self).__init__()
...         self.pad = nn.Pad(paddings=((1, 1), (2, 2)), mode="CONSTANT")
...     def construct(self, x):
...         return self.pad(x)
>>> x = Tensor(np.array([[1, 2, 3], [4, 5, 6]]), mindspore.float32)
>>> pad = Net()
>>> output = pad(x)
>>> print(output)
[[0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 2. 3. 0. 0.]
 [0. 0. 4. 5. 6. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]]
>>> # Another way to call
>>> pad = ops.Pad(paddings=((1, 1), (2, 2)))
>>> # From the above code, we can see following:
>>> # "paddings=((1, 1), (2, 2))",
>>> # paddings[0][0] = 1, indicates a row of values is filled top of the input data in the 1st dimension.
>>> # Shown as follows:
>>> # [[0. 0. 0.]
>>> #  [1. 2. 3.]
>>> #  [4. 5. 6.]]
>>> # paddings[0][1] = 1 indicates a row of values is filled below input data in the 1st dimension.
>>> # Shown as follows:
>>> # [[0. 0. 0.]
>>> #  [1. 2. 3.]
>>> #  [4. 5. 6.]
>>> #  [0. 0. 0.]]
>>> # paddings[1][0] = 2, indicates 2 rows of values is filled in front of input data in the 2nd dimension.
>>> # Shown as follows:
>>> # [[0. 0. 0. 0. 0.]
>>> #  [0. 0. 1. 2. 3.]
>>> #  [0. 0. 4. 5. 6.]
>>> #  [0. 0. 0. 0. 0.]]
>>> # paddings[1][1] = 2, indicates 2 rows of values is filled in front of input data in the 2nd dimension.
>>> # Shown as follows:
>>> # [[0. 0. 0. 0. 0. 0. 0.]
>>> #  [0. 0. 1. 2. 3. 0. 0.]
>>> #  [0. 0. 4. 5. 6. 0. 0.]
>>> #  [0. 0. 0. 0. 0. 0. 0.]]
>>> output = pad(x)
>>> print(output)
[[0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 2. 3. 0. 0.]
 [0. 0. 4. 5. 6. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]]
>>> # if mode is "REFLECT"
>>> class Net(nn.Cell):
...     def __init__(self):
...         super(Net, self).__init__()
...         self.pad = nn.Pad(paddings=((1, 1), (2, 2)), mode="REFLECT")
...     def construct(self, x):
...         return self.pad(x)
>>> x = Tensor(np.array([[1, 2, 3], [4, 5, 6]]), mindspore.float32)
>>> pad = Net()
>>> output = pad(x)
>>> print(output)
[[6. 5. 4. 5. 6. 5. 4.]
 [3. 2. 1. 2. 3. 2. 1.]
 [6. 5. 4. 5. 6. 5. 4.]
 [3. 2. 1. 2. 3. 2. 1.]]
>>> # if mode is "SYMMETRIC"
>>> class Net(nn.Cell):
...     def __init__(self):
...         super(Net, self).__init__()
...         self.pad = nn.Pad(paddings=((1, 1), (2, 2)), mode="SYMMETRIC")
...     def construct(self, x):
...         return self.pad(x)
>>> x = Tensor(np.array([[1, 2, 3], [4, 5, 6]]), mindspore.float32)
>>> pad = Net()
>>> output = pad(x)
>>> print(output)
[[2. 1. 1. 2. 3. 3. 2.]
 [2. 1. 1. 2. 3. 3. 2.]
 [5. 4. 4. 5. 6. 6. 5.]
 [5. 4. 4. 5. 6. 6. 5.]]

class tinyms.layers.Unfold(ksizes, strides, rates, padding='valid')[source]¶

Extracts patches from images. The input tensor must be a 4-D tensor and the data format is NCHW.

Parameters:

ksizes (Union[tuple[int], list[int]]) – The size of sliding window, must be a tuple or a list of integers, and the format is [1, ksize_row, ksize_col, 1].
strides (Union[tuple[int], list[int]]) – Distance between the centers of the two consecutive patches, must be a tuple or list of int, and the format is [1, stride_row, stride_col, 1].
rates (Union[tuple[int], list[int]]) – In each extracted patch, the gap between the corresponding dimension pixel positions, must be a tuple or a list of integers, and the format is [1, rate_row, rate_col, 1].
padding (str) –
The type of padding algorithm, is a string whose value is “same” or “valid”, not case sensitive. Default: “valid”.
- same: Means that the patch can take the part beyond the original image, and this part is filled with 0.
- valid: Means that the taken patch area must be completely covered in the original image.

Inputs:

x (Tensor) - A 4-D tensor whose shape is [in_batch, in_depth, in_row, in_col] and data type is number.

Outputs:

Tensor, a 4-D tensor whose data type is same as x, and the shape is [out_batch, out_depth, out_row, out_col] where out_batch is the same as the in_batch.

\(out\_depth = ksize\_row * ksize\_col * in\_depth\)
\(out\_row = (in\_row - (ksize\_row + (ksize\_row - 1) * (rate\_row - 1))) // stride\_row + 1\)
\(out\_col = (in\_col - (ksize\_col + (ksize\_col - 1) * (rate\_col - 1))) // stride\_col + 1\)

Raises:

TypeError – If ksizes, strides or rates is neither a tuple nor list.
ValueError – If shape of ksizes, strides or rates is not (1, x_row, x_col, 1).
ValueError – If the second and third element of ksizes, strides or rates is less than 1.

Supported Platforms:: Ascend GPU

Examples

>>> net = Unfold(ksizes=[1, 2, 2, 1], strides=[1, 2, 2, 1], rates=[1, 2, 2, 1])
>>> # As stated in the above code:
>>> # ksize_row = 2, ksize_col = 2, rate_row = 2, rate_col = 2, stride_row = 2, stride_col = 2.
>>> image = Tensor(np.ones([2, 3, 6, 6]), dtype=mstype.float16)
>>> # in_batch = 2, in_depth = 3, in_row = 6, in_col = 6.
>>> # Substituting the formula to get:
>>> # out_batch = in_batch = 2
>>> # out_depth = 2 * 2 * 3 = 12
>>> # out_row = (6 - (2 + (2 - 1) * (2 - 1))) // 2 + 1 = 2
>>> # out_col = (6 - (2 + (2 - 1) * (2 - 1))) // 2 + 1 = 2
>>> output = net(image)
>>> print(output.shape)
(2, 12, 2, 2)

class tinyms.layers.Tril[source]¶: ‘nn.Tril’ is deprecated from version 2.0 and will be removed in a future version, use ‘ops.tril’ instead.

class tinyms.layers.Triu[source]¶: ‘nn.Triu’ is deprecated from version 2.0 and will be removed in a future version, use ‘ops.triu’ instead.

class tinyms.layers.ResizeBilinear(half_pixel_centers=False)[source]¶

‘nn.ResizeBilinear’ is deprecated from version 2.0 and will be removed in a future version, use mindspore.ops.ResizeBilinearV2 or mindspore.ops.interpolate() instead.

Supported Platforms:: Deprecated

Examples

>>> x = Tensor([[[[1, 2, 3, 4], [5, 6, 7, 8]]]], mindspore.float32)
>>> resize_bilinear = nn.ResizeBilinear()
>>> result = resize_bilinear(x, size=(5,5))
>>> print(x)
[[[[1. 2. 3. 4.]
   [5. 6. 7. 8.]]]]
>>> print(result)
[[[[1.        1.8       2.6       3.4       4.       ]
   [2.6       3.4       4.2000003 5.        5.6000004]
   [4.2       5.0000005 5.8       6.6       7.2      ]
   [5.        5.8       6.6       7.4       8.       ]
   [5.        5.8       6.6       7.4000006 8.       ]]]]
>>> print(result.shape)
(1, 1, 5, 5)

class tinyms.layers.MatrixDiag[source]¶: ‘nn.MatrixDiag’ is deprecated from version 2.0 and will be removed in a future version, use ‘ops.diag’ instead.

class tinyms.layers.MatrixDiagPart[source]¶: ‘nn.MatrixDiagPart’ is deprecated from version 2.0 and will be removed in a future version, use ‘ops.diagonal’ instead.

class tinyms.layers.MatrixSetDiag[source]¶

Modifies the batched diagonal part of a batched tensor.

Assume x has \(k+1\) dimensions \([I, J, K, ..., M, N]\) and diagonal has \(k\) dimensions \([I, J, K, ..., min(M, N)]\), the output is a tensor of rank \(k+1\) with dimensions \([I, J, K, ..., M, N]\), where:

\[output[i, j, k, ..., m, n] = diagonal[i, j, k, ..., n]\ for\ m == n\]

\[output[i, j, k, ..., m, n] = x[i, j, k, ..., m, n]\ for\ m != n\]

Inputs:

x (Tensor) - The batched tensor. Rank k+1, where k >= 1. It can be one of the following data types: float32, float16, int32, int8, and uint8.
diagonal (Tensor) - The diagonal values. Must have the same type as input x. Rank k, where k >= 1.

Outputs:

Tensor, has the same type and shape as input x.

Raises:

TypeError – If dtype of x or diagonal is not one of float32, float16, int32, int8 or uint8.
ValueError – If length of shape of x is less than 2.
ValueError – If x_shape[-2] < x_shape[-1] and x_shape[:-1] != diagonal_shape.
ValueError – If x_shape[-2] >= x_shape[-1] and x_shape[:-2] + x_shape[-1:] != diagonal_shape.

Supported Platforms:: Ascend

Examples

>>> x = Tensor([[[-1, 0], [0, 1]], [[-1, 0], [0, 1]], [[-1, 0], [0, 1]]], mindspore.float32)
>>> diagonal = Tensor([[-1., 2.], [-1., 1.], [-1., 1.]], mindspore.float32)
>>> matrix_set_diag = nn.MatrixSetDiag()
>>> output = matrix_set_diag(x, diagonal)
>>> print(output)
[[[-1.  0.]
  [ 0.  2.]]
 [[-1.  0.]
  [ 0.  1.]]
 [[-1.  0.]
  [ 0.  1.]]]

class tinyms.layers.L1Regularizer(scale)[source]¶

Applies l1 regularization to weights.

l1 regularization makes weights sparsity.

\[\text{loss}=\lambda * \text{reduce_sum}(\text{abs}(\omega))\]

where \(\lambda\) is scale .

Note

scale(regularization factor) should be a number which greater than 0.

Parameters:: scale (int, float) – l1 regularization factor which greater than 0.

Inputs:

weights (Tensor) - The input of L1Regularizer with data type of float16 or float32. The shape is \((N,*)\) where \(*\) means, any number of additional dimensions.

Outputs:

Tensor, which dtype is higher precision data type between mindspore.float32 and weights dtype, and Tensor shape is ()

Raises:

TypeError – If scale is neither an int nor float.
ValueError – If scale is not greater than 0.
ValueError – If scale is math.inf or math.nan.

Supported Platforms:: Ascend GPU CPU

Examples

>>> scale = 0.5
>>> net = nn.L1Regularizer(scale)
>>> weights = Tensor(np.array([[1.0, -2.0], [-3.0, 4.0]]).astype(np.float32))
>>> output = net(weights)
>>> print(output.asnumpy())
5.0

class tinyms.layers.Dropout1d(p=0.5)[source]¶

During training, randomly zeroes entire channels of the input tensor with probability p from a Bernoulli distribution (For a 3-dimensional tensor with a shape of \((N, C, L)\), the channel feature map refers to a 1-dimensional feature map with the shape of \(L\)).

For example, the \(j\_th\) channel of the \(i\_th\) sample in the batched input is a to-be-processed 1D tensor input[i,j]. Each channel will be zeroed out independently on every forward call with probability p using samples from a Bernoulli distribution.

The paper Dropout: A Simple Way to Prevent Neural Networks from Overfitting mentioned this technology, And it is proved that it can effectively reduce over fitting and prevent neuronal coadaptation. For more details, refer to Improving neural networks by preventing co-adaptation of feature detectors .

Dropout1d can improve the independence between channel feature maps.

Parameters:: p (float, optional) – The dropping probability of a channel, between 0 and 1, e.g. p = 0.8, which means an 80% chance of being set to 0. Default: 0.5.

Inputs:

x (Tensor) - A tensor with shape \((N, C, L)\) or \((C, L)\), where N is the batch size, C is the number of channels, L is the feature length. The data type must be int8, int16, int32, int64, float16, float32 or float64.

Outputs:

Tensor, output, with the same shape and data type as x.

Raises:

TypeError – If x is not a Tensor.
TypeError – If the data type of p is not float.
ValueError – If p is out of the range [0.0, 1.0].
ValueError – If x shape is not 2D or 3D.

Supported Platforms:: Ascend GPU CPU

Examples

>>> import numpy as np
>>> import mindspore as ms
>>> from mindspore import nn, Tensor
>>> op = nn.Dropout1d(p=0.6)
>>> op.training = True
>>> a = Tensor(np.ones((3, 3)), ms.float32)
>>> output = op(a)

class tinyms.layers.Dropout2d(p=0.5)[source]¶

During training, randomly zeroes some channels of the input tensor with probability p from a Bernoulli distribution (For a 4-dimensional tensor with a shape of \(NCHW\), the channel feature map refers to a 2-dimensional feature map with the shape of \(HW\)).

For example, the \(j\_th\) channel of the \(i\_th\) sample in the batched input is a to-be-processed 2D tensor input[i,j]. Each channel will be zeroed out independently on every forward call with probability p using samples from a Bernoulli distribution.

Dropout2d can improve the independence between channel feature maps.

Refer to mindspore.ops.dropout2d() for more details.

Supported Platforms:: Ascend GPU CPU

Examples

>>> dropout = nn.Dropout2d(p=0.5)
>>> x = Tensor(np.ones([2, 1, 2, 3]), mindspore.float32)
>>> output = dropout(x)
>>> print(output.shape)
(2, 1, 2, 3)

class tinyms.layers.Dropout3d(p=0.5)[source]¶

During training, randomly zeroes some channels of the input tensor with probability p from a Bernoulli distribution (For a 5-dimensional tensor with a shape of \(NCDHW\), the channel feature map refers to a 3-dimensional feature map with a shape of \(DHW\)).

For example, the \(j\_th\) channel of the \(i\_th\) sample in the batched input is a to-be-processed 3D tensor input[i,j]. Each channel will be zeroed out independently on every forward call which based on Bernoulli distribution probability p.

Dropout3d can improve the independence between channel feature maps.

Refer to mindspore.ops.dropout3d() for more details.

Supported Platforms:: Ascend GPU CPU

Examples

>>> dropout = nn.Dropout3d(p=0.5)
>>> x = Tensor(np.ones([2, 1, 2, 1, 2]), mindspore.float32)
>>> output = dropout(x)
>>> print(output.shape)
(2, 1, 2, 1, 2)

class tinyms.layers.Upsample(size=None, scale_factor=None, mode='nearest', align_corners=None, recompute_scale_factor=None)[source]¶

For details, please refer to mindspore.ops.interpolate().

Supported Platforms:: Ascend GPU CPU

Examples

>>> x = Tensor([[[[1.0, 2.0, 3.0, 4.0], [5.0, 6.0, 7.0, 8.0]]]])
>>> upsample = nn.Upsample(size=(5, 5))
>>> out = upsample(x)
>>> print(x.asnumpy())
[[[[1. 2. 3. 4.]
   [5. 6. 7. 8.]]]]
>>> print(out.asnumpy())
[[[[1. 1. 2. 3. 4.]
   [1. 1. 2. 3. 4.]
   [1. 1. 2. 3. 4.]
   [5. 5. 6. 7. 8.]
   [5. 5. 6. 7. 8.]]]]
>>> print(out.shape)
(1, 1, 5, 5)

class tinyms.layers.Roll(shift, axis)[source]¶: ‘nn.Roll’ is deprecated from version 2.0 and will be removed in a future version, use ‘ops.roll’ instead.

class tinyms.layers.Identity[source]¶

Returns a Tensor with the same shape and contents as input.

Inputs:

x (Tensor) - The shape of tensor is \((x_1, x_2, ..., x_R)\). The data type is Number.

Outputs:

Tensor, the shape of tensor and the data type are the same as x.

Raises:: TypeError – If x is not a Tensor.

Supported Platforms:: Ascend GPU CPU

Examples

>>> x = Tensor(np.array([1, 2, 3, 4]), mindspore.int64)
>>> net = nn.Identity()
>>> output = net(x)
>>> print(output)
[1 2 3 4]

class tinyms.layers.Unflatten(axis, unflattened_size)[source]¶

Summary:: Unflattens a Tensor dim according to axis and unflattened_size.

Parameters:

axis (int) – specifies the dimension of the input Tensor to be unflattened.
unflattened_size (Union(tuple[int], list[int])) – the new shape of the unflattened dimension of the Tensor and it can be a tuple of ints or a list of ints. The product of unflattened_size must equal to input_shape[axis].

Inputs:

input (Tensor) - The input Tensor to be unflattened.

Outputs:

Tensor that has been unflattend.

Raises:

TypeError – If axis is not int.
TypeError – If unflattened_size is neither tuple of ints nor list of ints.
TypeError – The product of unflattened_size does not equal to input_shape[axis].

Supported Platforms:: Ascend GPU CPU

Examples

>>> input = Tensor(np.arange(0, 100).reshape(2, 10, 5), mindspore.float32)
>>> net = nn.Unflatten(1, (2, 5))
>>> output = net(input)
>>> print(f"before unflatten the input shape is {input.shape}")
before unflatten the input shape is  (2, 10, 5)
>>> print(f"after unflatten the output shape is {output.shape}")
after unflatten the output shape is (2, 2, 5, 5)

class tinyms.layers.Embedding(vocab_size, embedding_size, use_one_hot=False, embedding_table='normal', dtype=mindspore.float32, padding_idx=None)[source]¶

A simple lookup table that stores embeddings of a fixed dictionary and size.

This module is often used to store word embeddings and retrieve them using indices. The input to the module is a list of indices, and the output is the corresponding word embeddings.

Note

When ‘use_one_hot’ is set to True, the type of the x must be mindspore.int32.

Parameters:

vocab_size (int) – Size of the dictionary of embeddings.
embedding_size (int) – The size of each embedding vector.
use_one_hot (bool) – Specifies whether to apply one_hot encoding form. Default: False.
embedding_table (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the embedding_table. Refer to class initializer for the values of string when a string is specified. Default: ‘normal’.
dtype (mindspore.dtype) – Data type of x. Default: mindspore.float32.
padding_idx (int, None) – When the padding_idx encounters index, the output embedding vector of this index will be initialized to zero. Default: None. The feature is inactivated.

Inputs:

x (Tensor) - Tensor of shape \((\text{batch_size}, \text{x_length})\). The elements of the Tensor must be integer and not larger than vocab_size. Otherwise the corresponding embedding vector will be zero. The data type is int32 or int64.

Outputs:

Tensor of shape \((\text{batch_size}, \text{x_length}, \text{embedding_size})\).

Raises:

TypeError – If vocab_size or embedding_size is not an int.
TypeError – If use_one_hot is not a bool.
ValueError – If padding_idx is an int which not in range [0, vocab_size).

Supported Platforms:: Ascend GPU CPU

Examples

>>> net = nn.Embedding(20000, 768,  True)
>>> x = Tensor(np.ones([8, 128]), mindspore.int32)
>>> # Maps the input word IDs to word embedding.
>>> output = net(x)
>>> result = output.shape
>>> print(result)
(8, 128, 768)

class tinyms.layers.EmbeddingLookup(vocab_size, embedding_size, param_init='normal', target='CPU', slice_mode='batch_slice', manual_shapes=None, max_norm=None, sparse=True, vocab_cache_size=0)[source]¶

EmbeddingLookup layer. Same function as the embedding layer, mainly used for heterogeneous parallel scenarios where large-scale embedding layers exist when automatic parallelism or semi-automatic parallelism is present.

Note

When ‘target’ is set to ‘CPU’, this module will use P.EmbeddingLookup().set_device(‘CPU’) which specified ‘offset = 0’ to lookup table. When ‘target’ is set to ‘DEVICE’, this module will use P.Gather() which specified ‘axis = 0’ to lookup table. In field slice mode, the manual_shapes must be given. It is a tuple ,where the element is vocab[i], vocab[i] is the row numbers for i-th part.

Parameters:

vocab_size (int) – Size of the dictionary of embeddings.
embedding_size (int) – The size of each embedding vector.
param_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the embedding_table. Refer to class initializer for the values of string when a string is specified. Default: ‘normal’.
target (str) – Specifies the target where the op is executed. The value must in [‘DEVICE’, ‘CPU’]. Default: ‘CPU’.
slice_mode (str) – The slicing way in semi_auto_parallel/auto_parallel. The value must get through mindspore.nn.EmbeddingLookup. Default: ‘nn.EmbeddingLookup.BATCH_SLICE’.
manual_shapes (tuple) – The accompaniment array in field slice mode. Default: None.
max_norm (Union[float, None]) – A maximum clipping value. The data type must be float16, float32 or None. Default: None
sparse (bool) – Using sparse mode. When ‘target’ is set to ‘CPU’, ‘sparse’ has to be true. Default: True.
vocab_cache_size (int) – Cache size of the dictionary of embeddings. Default: 0. It is valid only in parameter server trainning mode and ‘DEVICE’ target. And the moment parameter of corresponding optimizer will also be set to the cache size. In addition, it should be noted that it will cost the ‘DEVICE’ memory, so suggests setting a reasonable value to avoid insufficient memory.

Inputs:

input_indices (Tensor) - The shape of tensor is \((y_1, y_2, ..., y_S)\). Specifies the indices of elements of the original Tensor. Values can be out of range of embedding_table, and the exceeding part will be filled with 0 in the output. Values does not support negative and the result is undefined if values are negative. Input_indices must only be a 2d tensor in this interface when run in semi auto parallel/auto parallel mode.

Outputs:

Tensor, the shape of tensor is \((z_1, z_2, ..., z_N)\).

Raises:

TypeError – If vocab_size or embedding_size or vocab_cache_size is not an int.
TypeError – If sparse is not a bool or manual_shapes is not a tuple.
ValueError – If vocab_size or embedding_size is less than 1.
ValueError – If vocab_cache_size is less than 0.
ValueError – If target is neither ‘CPU’ nor ‘DEVICE’.
ValueError – If slice_mode is not one of ‘batch_slice’ or ‘field_slice’ or ‘table_row_slice’ or ‘table_column_slice’.
ValueError – If sparse is False and target is ‘CPU’.
ValueError – If slice_mode is ‘field_slice’ and manual_shapes is None.

Supported Platforms:: Ascend GPU CPU

Examples

>>> input_indices = Tensor(np.array([[1, 0], [3, 2]]), mindspore.int32)
>>> result = nn.EmbeddingLookup(4,2)(input_indices)
>>> print(result.shape)
(2, 2, 2)

class tinyms.layers.MultiFieldEmbeddingLookup(vocab_size, embedding_size, field_size, param_init='normal', target='CPU', slice_mode='batch_slice', feature_num_list=None, max_norm=None, sparse=True, operator='SUM')[source]¶

Returns a slice of input tensor based on the specified indices and the field ids. This operation supports looking up embeddings using multi hot and one hot fields simultaneously.

Note

When ‘target’ is set to ‘CPU’, this module will use P.EmbeddingLookup().set_device(‘CPU’) which specified ‘offset = 0’ to lookup table. When ‘target’ is set to ‘DEVICE’, this module will use P.Gather() which specified ‘axis = 0’ to lookup table. The vectors with the same field_ids will be combined by the operator, such as ‘SUM’, ‘MAX’ and ‘MEAN’. Ensure the input_values of the padded id is zero, so that they can be ignored. The final output will be zeros if the sum of absolute weight of the field is zero. This class only supports [‘table_row_slice’, ‘batch_slice’ and ‘table_column_slice’]. For the operation ‘MAX’ on device Ascend, there is a constraint where \(batch\_size * (seq\_length + field\_size) < 3500\).

Parameters:

vocab_size (int) – The size of the dictionary of embeddings.
embedding_size (int) – The size of each embedding vector.
field_size (int) – The field size of the final outputs.
param_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the embedding_table. Refer to class initializer for the values of string when a string is specified. Default: ‘normal’.
target (str) – Specifies the target where the op is executed. The value must in [‘DEVICE’, ‘CPU’]. Default: ‘CPU’.
slice_mode (str) – The slicing way in semi_auto_parallel/auto_parallel. The value must get through mindspore.nn.EmbeddingLookup. Default: ‘nn.EmbeddingLookup.BATCH_SLICE’.
feature_num_list (tuple) – The accompaniment array in field slice mode. This is unused currently. Default: None.
max_norm (Union[float, None]) – A maximum clipping value. The data type must be float16, float32 or None. Default: None
sparse (bool) – Using sparse mode. When ‘target’ is set to ‘CPU’, ‘sparse’ has to be true. Default: True.
operator (str) – The pooling method for the features in one field. Support ‘SUM’, ‘MEAN’ and ‘MAX’. Default: ‘SUM’.

Inputs:

input_indices (Tensor) - The shape of tensor is \((batch\_size, seq\_length)\). Specifies the indices of elements of the original Tensor. Input_indices must be a 2d tensor in this interface. Type is Int32, Int64.
input_values (Tensor) - The shape of tensor is \((batch\_size, seq\_length)\). Specifies the weights of elements of the input_indices. The lookout vector will multiply with the input_values. Type is Float32.
field_ids (Tensor) - The shape of tensor is \((batch\_size, seq\_length)\). Specifies the field id of elements of the input_indices. Type is Int32.

Outputs:

Tensor, the shape of tensor is \((batch\_size, field\_size, embedding\_size)\). Type is Float32.

Raises:

TypeError – If vocab_size or embedding_size or field_size is not an int.
TypeError – If sparse is not a bool or feature_num_list is not a tuple.
ValueError – If vocab_size or embedding_size or field_size is less than 1.
ValueError – If target is neither ‘CPU’ nor ‘DEVICE’.
ValueError – If slice_mode is not one of ‘batch_slice’, ‘field_slice’, ‘table_row_slice’, ‘table_column_slice’.
ValueError – If sparse is False and target is ‘CPU’.
ValueError – If slice_mode is ‘field_slice’ and feature_num_list is None.
ValueError – If operator is not one of ‘SUM’, ‘MAX’, ‘MEAN’.

Supported Platforms:: Ascend GPU

Examples

>>> input_indices = Tensor([[2, 4, 6, 0, 0], [1, 3, 5, 0, 0]], mindspore.int32)
>>> input_values = Tensor([[1, 1, 1, 0, 0], [1, 1, 1, 0, 0]], mindspore.float32)
>>> field_ids = Tensor([[0, 1, 1, 0, 0], [0, 0, 1, 0, 0]], mindspore.int32)
>>> net = nn.MultiFieldEmbeddingLookup(10, 2, field_size=2, operator='SUM', target='DEVICE')
>>> out = net(input_indices, input_values, field_ids)
>>> print(out.shape)
(2, 2, 2)

class tinyms.layers.AvgPool3d(kernel_size=1, stride=1, pad_mode='valid', padding=0, ceil_mode=False, count_include_pad=True, divisor_override=None)[source]¶

Applies a 3D average pooling over an input Tensor which can be regarded as a composition of 3D input planes. Typically, the input is of shape \((N_{in}, C_{in}, D_{in}, H_{in}, W_{in})\), and AvgPool3D outputs regional average in the \((D_{in}, H_{in}, W_{in})\)-dimension. Given kernel size is \(ks = (d_{ker}, h_{ker}, w_{ker})\) and stride \(s = (s_0, s_1, s_2)\), the operation is as follows.

Warning

kernel_size is in the range [1, 255]. stride is in the range [1, 63].

\[\text{output}(N_i, C_j, d, h, w) = \frac{1}{d_{ker} * h_{ker} * w_{ker}} \sum_{l=0}^{d_{ker}-1} \sum_{m=0}^{h_{ker}-1} \sum_{n=0}^{w_{ker}-1} \text{input}(N_i, C_j, s_0 \times d + l, s_1 \times h + m, s_2 \times w + n)\]

Parameters:

kernel_size (Union[int, tuple[int]], optional) – The size of kernel used to take the average value, can be an int number or a single element tuple that represents depth, height and width, or a tuple of three positive integers that represent depth, height and width respectively. Default: 1.
stride (Union[int, tuple[int]], optional) – The distance of kernel moving, can be a positive int or a single element tuple that represents the depth, height and width of movement, or a tuple of three positive integers that represents depth, height and width of movement respectively. If the value is None, the default value kernel_size is used. Default: 1.
pad_mode (str, optional) –
Specifies the padding method of pooling, optional values are “same”, “valid” or “pad”, case insensitive. Default: “valid”.
- same: The depth, height and width of the output is the same as the value after the input is divided by stride.
- valid: Returns the output obtained by effective calculation without padding. The excess pixels that do not meet the calculation will be discarded.
- pad: Pads the input. Fill the front, back, top, and bottom of the input with 0s of size padding. If this mode is set, padding must be greater than or equal to 0.
padding (Union(int, tuple[int], list[int]), optional) –
Pooling padding value, only ‘pad’ mode can be set to non-zero. Default: 0. Only the following paddings are supported:
- padding is an integer or a tuple/list containing one integer, it will be padded in six directions of front, back, top, bottom, left and right of the input.
- padding is a tuple/list containing three integers, it will be padded in front and back of the input padding[0] times, up and down padding[1] times, and left and right of the input padding[2] times.
ceil_mode (bool, optional) – If True, use ceil to compute the output shape instead of floor. Default: False.
count_include_pad (bool, optional) – If True, averaging calculation will include the zero-padding. Default: True.
divisor_override (int, optional) – If it is specified as a non-zero parameter, this parameter will be used as the divisor in the average calculation. Otherwise, kernel_size will be used as the divisor. Default: None.

Inputs:

x (Tensor) - Tensor of shape \((N, C, D_{in}, H_{in}, W_{in})\) or \((C, D_{in}, H_{in}, W_{in})\). Currently support float16 and float32 data type.

Outputs:

Tensor, with shape \((N, C, D_{out}, H_{out}, W_{out})\) or \((C, D_{out}, H_{out}, W_{out})\), with the same data type as x.

If pad_mode is in pad mode, the output shape calculation formula is as follows:

\[D_{out} = \left\lfloor\frac{D_{in} + 2 \times \text{padding}[0] - \text{kernel_size}[0]}{\text{stride}[0]} + 1\right\rfloor\]

\[H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[1] - \text{kernel_size}[1]}{\text{stride}[1]} + 1\right\rfloor\]

\[W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[2] - \text{kernel_size}[2]}{\text{stride}[2]} + 1\right\rfloor\]

Raises:

TypeError – If kernel_size is neither an int nor a tuple.
TypeError – If stride is neither an int nor a tuple.
TypeError – If padding is neither an int nor a tuple/list.
TypeError – If ceil_mode or count_include_pad is not a bool.
TypeError – If divisor_override is not an int.
ValueError – If numbers in kernel_size or stride are not positive.
ValueError – If kernel_size or stride is a tuple whose length is not equal to 3.
ValueError – If padding is a tuple/list whose length is neither 1 nor 3.
ValueError – If element of padding is less than 0.
ValueError – If length of shape of x is neither 4 nor 5.
ValueError – If divisor_override is less than or equal to 0.
ValueError – If padding is non-zero when pad_mode is not ‘pad’.

Supported Platforms:: Ascend GPU CPU

Examples

>>> import mindspore as ms
>>> import mindspore.nn as nn
>>> import mindspore.ops as ops
>>> pool = nn.AvgPool3d(kernel_size=3, stride=1)
>>> x = ops.randn(1, 2, 4, 4, 5).astype(ms.float32)
>>> output = pool(x)
>>> print(output.shape)
(1, 2, 2, 2, 3)
>>> x1 = ops.randn(6, 5, 7, 7, 5).astype(ms.float32)
>>> pool2 = nn.AvgPool3d(4, stride=2, pad_mode='pad', padding=(2, 2, 1), divisor_override=10)
>>> output2 = pool2(x1)
>>> print(output2.shape)
(6, 5, 4, 4, 2)

class tinyms.layers.MaxPool3d(kernel_size=1, stride=1, pad_mode='valid', padding=0, dilation=1, return_indices=False, ceil_mode=False)[source]¶

3D max pooling operation.

Applies a 3D max pooling over an input Tensor which can be regarded as a composition of 3D planes.

Typically the input is of shape \((N_{in}, C_{in}, D_{in}, H_{in}, W_{in})\), MaxPool outputs regional maximum in the \((D_{in}, H_{in}, W_{in})\)-dimension. Given kernel size is \(ks = (d_{ker}, h_{ker}, w_{ker})\) and stride is \(s = (s_0, s_1, s_2)\), the operation is as follows.

\[\text{output}(N_i, C_j, d, h, w) = \max_{l=0, \ldots, d_{ker}-1} \max_{m=0, \ldots, h_{ker}-1} \max_{n=0, \ldots, w_{ker}-1} \text{input}(N_i, C_j, s_0 \times d + l, s_1 \times h + m, s_2 \times w + n)\]

Parameters:

kernel_size (Union[int, tuple[int]]) – The size of kernel used to take the maximum value, is an int number or a single element tuple that represents depth, height and width of the kernel, or a tuple of three int numbers that represent depth, height and width respectively. The value must be a positive integer. Default: 1.
stride (Union[int, tuple[int]]) – The moving stride of pooling operation, an int number or a single element tuple that represents the moving stride of pooling kernel in the directions of depth, height and the width, or a tuple of three int numbers that represent depth, height and width of movement respectively. The value must be a positive integer. If the value is None, the default value kernel_size is used. Default: 1.
pad_mode (str) –
The optional value for pad mode, is “same”, “valid” or “pad”, not case sensitive. Default: “valid”.
- same: The output shape is the same as the input shape evenly divided by stride.
- valid: The possible largest height and width of output will be returned without padding. Extra pixels will be discarded.
- pad: pads the input. Pads the top, bottom, left, and right sides of the input with padding number of zeros. If this mode is set, padding must be greater than or equal to 0.
padding (Union(int, tuple[int], list[int])) – Pooling padding value. Default: 0. padding can only be an integer or a tuple/list containing one or three integers. If padding is an integer or a tuple/list containing one integer, it will be padded in six directions of front, back, top, bottom, left and right of the input. If padding is a tuple/list containing three integers, it will be padded in front and back of the input padding[0] times, up and down padding[1] times, and left and right of the input padding[2] times.
dilation (Union(int, tuple[int])) – The spacing between the elements of the kernel in convolution, used to increase the receptive field of the pooling operation. If it is a tuple, it must contain one or three integers. Default: 1.
return_indices (bool) – If True, output is a Tuple of 2 Tensors, representing the maxpool result and where the max values are generated. Otherwise, only the maxpool result is returned. Default: False.
ceil_mode (bool) – Whether to use ceil or floor to calculate output shape. Default: False.

Inputs:

x (Tensor) - Tensor of shape \((N_{in}, C_{in}, D_{in}, H_{in}, W_{in})\) or \((C_{in}, D_{in}, H_{in}, W_{in})\).

Outputs:

If return_indices is False, output is a Tensor, with shape \((N_{out}, C_{out}, D_{out}, H_{out}, W_{out})\) or \((C_{out}, D_{out}, H_{out}, W_{out})\). It has the same data type as x.

If return_indices is True, output is a Tuple of 2 Tensors, representing the maxpool result and where the max values are generated.

output (Tensor) - Maxpooling result, with shape \((N_{out}, C_{out}, D_{out}, H_{out}, W_{out})\) or \((C_{out}, D_{out}, H_{out}, W_{out})\). It has the same data type as x.
argmax (Tensor) - Index corresponding to the maximum value. Data type is int64.

If pad_mode is in pad mode, the output shape calculation formula is as follows:

\[D_{out} = \left\lfloor\frac{D_{in} + 2 \times \text{padding}[0] - \text{dilation}[0] \times (\text{kernel_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor\]

\[H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[1] - \text{dilation}[1] \times (\text{kernel_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor\]

\[W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[2] - \text{dilation}[2] \times (\text{kernel_size}[2] - 1) - 1}{\text{stride}[2]} + 1\right\rfloor\]

Raises:

ValueError – If length of shape of x is not equal to 4 or 5.
TypeError – If kernel_size , stride , padding or dilation is neither an int nor a tuple.
ValueError – If kernel_size or stride is less than 1.
ValueError – If the padding parameter is neither an integer nor a tuple of length 3.
ValueError – If pad_mode is not set to ‘pad’, setting return_indices to True or dilation to a value other than 1.
ValueError – If padding is non-zero when pad_mode is not ‘pad’.

Supported Platforms:: Ascend GPU CPU

Examples

>>> import mindspore as ms
>>> import mindspore.nn as nn
>>> import numpy as np
>>> np_x = np.random.randint(0, 10, [5, 3, 4, 6, 7])
>>> x = Tensor(np_x, ms.float32)
>>> pool1 = nn.MaxPool3d(kernel_size=2, stride=1, pad_mode='pad', padding=1, dilation=3, return_indices=True)
>>> output = pool1(x)
>>> print(output[0].shape)
(5, 3, 3, 5, 6)
>>> print(output[1].shape)
(5, 3, 3, 5, 6)
>>> pool2 = nn.MaxPool3d(kernel_size=2, stride=1, pad_mode='pad', padding=1, dilation=3, return_indices=False)
>>> output2 = pool2(x)
>>> print(output2.shape)
(5, 3, 3, 5, 6)

class tinyms.layers.AvgPool2d(kernel_size=1, stride=1, pad_mode='valid', padding=0, ceil_mode=False, count_include_pad=True, divisor_override=None, data_format='NCHW')[source]¶

Applies a 2D average pooling over an input Tensor which can be regarded as a composition of 2D input planes.

Typically the input is of shape \((N_{in}, C_{in}, H_{in}, W_{in})\), AvgPool2d outputs regional average in the \((H_{in}, W_{in})\)-dimension. Given kernel size \(ks = (h_{ker}, w_{ker})\) and stride \(s = (s_0, s_1)\), the operation is as follows:

\[\text{output}(N_i, C_j, h, w) = \frac{1}{h_{ker} * w_{ker}} \sum_{m=0}^{h_{ker}-1} \sum_{n=0}^{w_{ker}-1} \text{input}(N_i, C_j, s_0 \times h + m, s_1 \times w + n)\]

Parameters:

kernel_size (Union[int, tuple[int]]) – The size of kernel used to take the average value. The data type of kernel_size must be int or a single element tuple and the value represents the height and width, or a tuple of two int numbers that represent height and width respectively. Default: 1.
stride (Union[int, tuple[int]]) – The distance of kernel moving, an int number or a single element tuple that represents the height and width of movement are both strides, or a tuple of two int numbers that represent height and width of movement respectively. Default: 1.
pad_mode (str) –
case insensitive. Default: “valid”.
- same: The height and width of the output is the same as the value after the input is divided by stride.
- valid: Returns the output obtained by effective calculation without padding. The excess pixels that do not meet the calculation will be discarded.
- pad: pads the input. Pads the top, bottom, left, and right sides of the input with padding number of zeros. If this mode is set, padding must be greater than or equal to 0.
padding (Union(int, tuple[int], list[int])) – Pooling padding value, only ‘pad’ mode can be set to non-zero. Default: 0. padding can only be an integer or a tuple/list containing one or two integers. If padding is an integer or a tuple/list containing one integer, it will be padded padding times in the four directions of the input. If padding is a tuple/list containing two integers, it will be padded padding[0] times in the up-down direction of the input and padding[1] times in the left-right direction of the input.
ceil_mode (bool) – If True, use ceil to compute the output shape instead of floor. Default: False.
count_include_pad (bool) – If True, averaging calculation will include the zero-padding. Default: True.
divisor_override (int) – If it is specified as a non-zero parameter, this parameter will be used as the divisor in the average calculation. Otherwise, kernel_size will be used as the divisor. Default: None.
data_format (str) – The optional value for data format, is ‘NHWC’ or ‘NCHW’. Default: ‘NCHW’.

Inputs:

x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\) or \((C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, C_{out}, H_{out}, W_{out})\) or \((C_{out}, H_{out}, W_{out})\).

If pad_mode is in pad mode, the output shape calculation formula is as follows:

\[H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[0] - \text{kernel_size}[0]}{\text{stride}[0]} + 1\right\rfloor\]

\[W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[1] - \text{kernel_size}[1]}{\text{stride}[1]} + 1\right\rfloor\]

Raises:

TypeError – If kernel_size or strides is neither int nor tuple.
ValueError – If pad_mode is not ‘valid’ ,’same’ or ‘pad’ with not case sensitive.
ValueError – If data_format is neither ‘NCHW’ nor ‘NHWC’.
ValueError – If padding, ceil_mode, count_include_pad, or divisor_override is used or pad_mode is pad when data_format is ‘NHWC’.
ValueError – If kernel_size or strides is less than 1.
ValueError – If length of padding tuple/list is not 1 or 2.
ValueError – If length of shape of x is not equal to 3 or 4.
ValueError – If divisor_override is less than or equal to 0.
ValueError – If padding is non-zero when pad_mode is not ‘pad’.

Supported Platforms:: Ascend GPU CPU

Examples

>>> import mindspore as ms
>>> import mindspore.nn as nn
>>> import mindspore.ops as ops
>>> import numpy as np
>>> pool = nn.AvgPool2d(kernel_size=3, stride=1)
>>> x = ms.Tensor(np.random.randint(0, 10, [1, 2, 4, 4]), ms.float32)
>>> output = pool(x)
>>> print(output.shape)
(1, 2, 2, 2)
>>> x = ops.randn(6, 6, 8, 8)
>>> pool2 = nn.AvgPool2d(4, stride=1, pad_mode='pad', padding=2, divisor_override=5)
>>> output2 = pool2(x)
>>> print(output2.shape)
(6, 6, 9, 9)

class tinyms.layers.MaxPool2d(kernel_size=1, stride=1, pad_mode='valid', padding=0, dilation=1, return_indices=False, ceil_mode=False, data_format='NCHW')[source]¶

Applies a 2D max pooling over an input Tensor which can be regarded as a composition of 2D planes.

Typically the input is of shape \((N_{in}, C_{in}, H_{in}, W_{in})\), MaxPool2d outputs regional maximum in the \((H_{in}, W_{in})\)-dimension. Given kernel size \((h_{ker}, w_{ker})\) and stride \((s_0, s_1)\), the operation is as follows.

\[\text{output}(N_i, C_j, h, w) = \max_{m=0, \ldots, h_{ker}-1} \max_{n=0, \ldots, w_{ker}-1} \text{input}(N_i, C_j, s_0 \times h + m, s_1 \times w + n)\]

Parameters:

kernel_size (Union[int, tuple[int]]) – The size of kernel used to take the max value, is an int number or a single element tuple that represents height and width are both kernel_size, or a tuple of two int numbers that represent height and width respectively. Default: 1.
stride (Union[int, tuple[int]]) – The distance of kernel moving, an int number or a single element tuple that represents the height and width of movement are both stride, or a tuple of two int numbers that represent height and width of movement respectively. Default: 1.
pad_mode (str) –
The optional value for pad mode, is “same”, “valid” or “pad”, not case sensitive. Default: “valid”.
- same: The output shape is the same as the input shape evenly divided by stride.
- valid: The possible largest height and width of output will be returned without padding. Extra pixels will be discarded.
- pad: pads the input. Pads the top, bottom, left, and right sides of the input with padding number of zeros. If this mode is set, padding must be greater than or equal to 0.
padding (Union(int, tuple[int], list[int])) – Specifies the padding value of the pooling operation. Default: 0. padding can only be an integer or a tuple/list containing one or two integers. If padding is an integer or a tuple/list containing one integer, it will be padded padding times in the four directions of the input. If padding is a tuple/list containing two integers, it will be padded padding[0] times in the up-down direction of the input and padding[1] times in the left-right direction of the input.
dilation (Union(int, tuple[int])) – The spacing between the elements of the kernel in convolution, used to increase the receptive field of the pooling operation. If it is a tuple, it must contain one or two integers. Default: 1.
return_indices (bool) – If True, the function will return both the result of max pooling and the indices of the max elements. Default: False.
ceil_mode (bool) – If True, use ceil to compute the output shape instead of floor. Default: False.
data_format (str) – The optional value for data format, is ‘NHWC’ or ‘NCHW’. Default: ‘NCHW’.

Inputs:

x (Tensor) - Tensor of shape \((N,C_{in},H_{in},W_{in})\) or \((C_{in},H_{in},W_{in})\).

Outputs:

If return_indices is False, output is a Tensor, with shape \((N, C, H_{out}, W_{out})\) or \((C_{out}, H_{out}, W_{out})\). It has the same data type as x.

If return_indices is True, output is a Tuple of 2 Tensors, representing the maxpool result and where the max values are generated.

output (Tensor) - Maxpooling result, with shape \((N_{out}, C_{out}, H_{out}, W_{out})\) or \((C_{out}, H_{out}, W_{out})\). It has the same data type as x.
argmax (Tensor) - Index corresponding to the maximum value. Data type is int64.

If pad_mode is in pad mode, the output shape calculation formula is as follows:

\[H_{out} = \left\lfloor\frac{H_{in} + 2 * \text{padding[0]} - \text{dilation[0]} \times (\text{kernel_size[0]} - 1) - 1}{\text{stride[0]}} + 1\right\rfloor\]

\[W_{out} = \left\lfloor\frac{W_{in} + 2 * \text{padding[1]} - \text{dilation[1]} \times (\text{kernel_size[1]} - 1) - 1}{\text{stride[1]}} + 1\right\rfloor\]

Raises:

TypeError – If kernel_size or stride is neither int nor tuple.
ValueError – If pad_mode is neither ‘valid’ nor ‘same’ with not case sensitive.
ValueError – If data_format is neither ‘NCHW’ nor ‘NHWC’.
ValueError – If kernel_size or stride is less than 1.
ValueError – If length of shape of x is not equal to 3 or 4.
ValueError – If pad_mode is not ‘pad’, padding, dilation, return_indices, ceil_mode parameters are not set to their default values.
ValueError – If the length of the tuple/list padding parameter is not 2.
ValueError – If The length of the tuple dilation parameter is not 2.
ValueError – If dilation parameter is neither an integer nor a tuple.
ValueError – If pad_mode is ‘pad’ and data_format is ‘NHWC’.
ValueError – If padding is non-zero when pad_mode is not ‘pad’.

Supported Platforms:: Ascend GPU CPU

Examples

>>> pool = nn.MaxPool2d(kernel_size=3, stride=1)
>>> x = Tensor(np.random.randint(0, 10, [1, 2, 4, 4]), mindspore.float32)
>>> output = pool(x)
>>> print(output.shape)
(1, 2, 2, 2)
>>> np_x = np.random.randint(0, 10, [5, 3, 4, 5])
>>> x = Tensor(np_x, mindspore.float32)
>>> pool2 = nn.MaxPool2d(kernel_size=2, stride=1, pad_mode='pad', padding=1, dilation=1, return_indices=True)
>>> output = pool2(x)
>>> print(output[0].shape)
(5, 3, 5, 6)
>>> print(output[1].shape)
(5, 3, 5, 6)

class tinyms.layers.AvgPool1d(kernel_size=1, stride=1, pad_mode='valid', padding=0, ceil_mode=False, count_include_pad=True)[source]¶

Applies a 1D average pooling over an input Tensor which can be regarded as a composition of 1D input planes.

Typically the input is of shape \((N_{in}, C_{in}, L_{in})\), AvgPool1d outputs regional average in the \((L_{in})\)-dimension. Given kernel_size \(l_{ker}\) and stride \(s_0\), the operation is as follows:

\[\text{output}(N_i, C_j, l) = \frac{1}{l_{ker}} \sum_{n=0}^{l_{ker}-1} \text{input}(N_i, C_j, s_0 \times l + n)\]

Parameters:

kernel_size (int) – The size of kernel window used to take the average value, Default: 1.
stride (int) – The distance of kernel moving, an int number that represents the width of movement is strides, Default: 1.
pad_mode (str) –
case insensitive. Default: “valid”.
- same: The width of the output is the same as the value after the input is divided by stride.
- valid: Returns the output obtained by effective calculation without padding. The excess pixels that do not meet the calculation will be discarded.
- pad: Performs padding on the input. Adds padding size of zeros to both ends of the input. If this mode is set, padding must be greater than or equal to 0.
padding (Union(int, tuple[int], list[int])) – Pooling padding value, only ‘pad’ mode can be set to non-zero. Default: 0. padding can only be an integer or a tuple/list containing a single integer, in which case padding times or padding[0] times are padded on both sides of the input.
ceil_mode (bool) – If True, use ceil to compute the output shape instead of floor. Default: False.
count_include_pad (bool) – If True, averaging calculation will include the zero-padding. Default: True.

Inputs:

x (Tensor) - Tensor of shape \((N, C_{in}, L_{in})\) or \((C_{in}, L_{in})\).

Outputs:

Tensor of shape \((N, C_{out}, L_{out})\) or \((C_{out}, L_{out})\).

If pad_mode is in pad mode, the output shape calculation formula is as follows:

\[L_{out} = \left\lfloor \frac{L_{in} + 2 \times \text{padding} - \text{kernel_size}}{\text{stride}} + 1\right\rfloor\]

Raises:

TypeError – If kernel_size or stride is not an int.
ValueError – If pad_mode is not ‘valid’ ,’same’ or ‘pad’ with not case sensitive.
ValueError – If kernel_size or strides is less than 1.
ValueError – If length of padding tuple/list is not 1.
ValueError – If length of shape of x is not equal to 2 or 3.
ValueError – If padding is non-zero when pad_mode is not ‘pad’.

Supported Platforms:: Ascend GPU CPU

Examples

>>> import mindspore as ms
>>> import mindspore.nn as nn
>>> import mindspore.ops as ops
>>> import numpy as np
>>> pool = nn.AvgPool1d(kernel_size=6, stride=1)
>>> x = ms.Tensor(np.random.randint(0, 10, [1, 3, 6]), ms.float32)
>>> output = pool(x)
>>> result = output.shape
>>> print(result)
(1, 3, 1)
>>> pool2 = nn.AvgPool1d(4, stride=1, ceil_mode=True, pad_mode='pad', padding=2)
>>> x1 = ops.randn(6, 6, 8)
>>> output = pool2(x1)
>>> print(output.shape)
(6, 6, 9)

class tinyms.layers.MaxPool1d(kernel_size=1, stride=1, pad_mode='valid', padding=0, dilation=1, return_indices=False, ceil_mode=False)[source]¶

Applies a 1D max pooling over an input Tensor which can be regarded as a composition of 1D planes.

Typically the input is of shape \((N_{in}, C_{in}, L_{in})\), MaxPool1d outputs regional maximum in the \((L_{in})\)-dimension. Given kernel size \(ks = (l_{ker})\) and stride \(s = (s_0)\), the operation is as follows:

\[\text{output}(N_i, C_j, l) = \max_{n=0, \ldots, l_{ker}-1} \text{input}(N_i, C_j, s_0 \times l + n)\]

Parameters:

kernel_size (int) – The size of kernel used to take the max value, Default: 1.
stride (int) – The distance of kernel moving, an int number that represents the width of movement is stride, Default: 1.
pad_mode (str) –
The optional value for pad mode, is “same”, “valid” or “pad”, not case sensitive. Default: “valid”.
- same: Adopts the way of completion. The total number of padding will be calculated in horizontal and vertical directions and evenly distributed to top and bottom, left and right if possible. Otherwise, the last extra padding will be done from the bottom and the right side.
- valid: Adopts the way of discarding. The possible largest height and width of output will be returned without padding. Extra pixels will be discarded.
- pad: Performs padding on the input. Adds padding size of zeros to both ends of the input. If this mode is set, padding must be greater than or equal to 0.
padding (Union(int, tuple[int], list[int])) – Padding value for the pooling. Default value is 0. padding can only be an integer or a tuple/list containing a single integer, in which case padding times or padding[0] times are padded on both sides of the input.
dilation (Union(int, tuple[int])) – The spacing between the elements of the kernel in convolution, used to increase the receptive field of the pooling operation. If it is a tuple, its length can only be 1. Default: 1.
return_indices (bool) – If True, the function will return both the result of max pooling and the indices of the max elements. Default: False.
ceil_mode (bool) – If True, use ceil to compute the output shape instead of floor. Default: False.

Inputs:

x (Tensor) - Tensor of shape \((N, C_{in}, L_{in})\) or \((C_{in}, L_{in})\).

Outputs:

If return_indices is False, output is a Tensor, with shape \((N, C_{out}, L_{out})\) or \((C_{out}, L_{out})\). It has the same data type as x.

If return_indices is True, output is a Tuple of 2 Tensors, representing the maxpool result and where the max values are generated.

output (Tensor) - Maxpooling result, with shape \((N, C_{out}, L_{out})\) or \((C_{out}, L_{out})\). It has the same data type as x.
argmax (Tensor) - Index corresponding to the maximum value. Data type is int64.

If pad_mode is in pad mode, the output shape calculation formula is as follows:

\[L_{out} = \left\lfloor \frac{L_{in} + 2 \times \text{padding} - \text{dilation} \times (\text{kernel_size} - 1) - 1}{\text{stride}} + 1\right\rfloor\]

Raises:

TypeError – If kernel_size or strides is not an int.
ValueError – If pad_mode is not ‘valid’, ‘same’ or ‘pad’, case-insensitive.
ValueError – If data_format is neither ‘NCHW’ nor ‘NHWC’.
ValueError – If kernel_size or strides is less than 1.
ValueError – If length of shape of x is not equal to 2 or 3.
ValueError – If pad_mode is not ‘pad’, padding, dilation, return_indices, ceil_mode parameters are not set to their default values.
ValueError – If the length of the tuple/list padding parameter is not 1.
ValueError – If The length of the tuple dilation parameter is not 1.
ValueError – If dilation parameter is neither an integer nor a tuple.
ValueError – If padding is non-zero when pad_mode is not ‘pad’.

Supported Platforms:: Ascend GPU CPU

Examples

>>> mpool1 = nn.MaxPool1d(kernel_size=3, stride=1)
>>> x = Tensor(np.random.randint(0, 10, [1, 2, 4]), mindspore.float32)
>>> output = mpool1(x)
>>> result = output.shape
>>> print(result)
(1, 2, 2)
>>> np_x = np.random.randint(0, 10, [5, 3, 4])
>>> x = Tensor(np_x, mindspore.float32)
>>> mpool2 = nn.MaxPool1d(kernel_size=2, stride=1, pad_mode='pad', padding=1, dilation=1, return_indices=True)
>>> output = mpool2(x)
>>> print(output[0].shape)
(5, 3, 5)
>>> print(output[1].shape)
(5, 3, 5)

class tinyms.layers.FractionalMaxPool2d(kernel_size, output_size=None, output_ratio=None, return_indices=False, _random_samples=None)[source]¶

Applies the 2D FractionalMaxPool operatin over input. The output Tensor shape can be determined by either output_size or output_ratio, and the step size is determined by _random_samples. output_size or output_ratio cannot be used or set to None at the same time.

Refer to the paper Fractional MaxPooling by Ben Graham for more details.

Parameters:

kernel_size (Union[int, tuple[int]]) – The size of kernel used to take the maximum value, is an int number that represents height and width of the kernel, or a tuple of two int numbers that represent height and width respectively. The value must be a positive integer.
output_size (Union[int, tuple[int]], optional) – The Shape of the target output_size, is a positive int that represents height and width, or a tuple of two positive integers that represent height and width respectively. The value must be a positive integer. If None, the shape of the target will be determined by output_ratio. Default: None.
output_ratio (Union[float, tuple[float]], optional) – The ratio of target output shape to input shape. Specifying the size of the output tensor by using a ratio of the input size. Data type : float16, float32, float64, and value is between (0, 1). If None, the shape of the target will be determined by output_size. Default: None.
return_indices (bool, optional) – Whether to return the indices of max value. Default: False.
_random_samples (Tensor, optional) – The random step of FractionalMaxPool2d, a Tensor of shape \((N, C, 2)\) whose elements are within the range of \((0, 1)\). Supported data type : float16, float32, float64. If None, no random step will be set. Default: None.

Inputs:

input (Tensor) - Tensor of shape \((N, C, H_{in}, W_{in})\), with float16, float32, float64, int32, int64 data type.

Outputs:

y (Tensor) - Has the same type as the input. Has the shape \((N, C, H, W)\).
argmax (Tensor) - The indices along with the outputs, which is a Tensor, with the same shape as the y and int64 data type. It will be returned only when return_indices is True.

Raises:

TypeError – If data type of input is not one of the following: float16, float32, float64, int32, int64.
TypeError – If data type of _random_samples is not one of the following: float16, float32, float64.
ValueError – If kernel_size is not a number and kernel_size is not a tuple of length 2.
ValueError – If output_size is not a number and output_size is not a tuple of length 2.
ValueError – If the sum of kernel_size , output_size and -1 is larger than the corresponding dimension of input.
ValueError – If the dimension of _random_samples is not 3.
ValueError – if output_size and output_ratio are None at the same time.
ValueError – If the first dimension size of input and _random_samples is not equal.
ValueError – If the second dimension size of input and _random_samples is not equal.
ValueError – If the third dimension size of _random_samples is not 2.

Supported Platforms:: CPU

Examples

>>> # the kernel_size is an int number and the output_size is a tuple.
>>> import numpy as np
>>> from mindspore import nn
>>> from mindspore import Tensor
>>> import mindspore.common.dtype as mstype
>>> input = Tensor(np.array([0.3220, 0.9545, 0.7879, 0.0975, 0.3698,
...                            0.5135, 0.5740, 0.3435, 0.1895, 0.8764,
...                            0.9581, 0.4760, 0.9014, 0.8522, 0.3664,
...                            0.4980, 0.9673, 0.9879, 0.6988, 0.9022,
...                            0.9304, 0.1558, 0.0153, 0.1559, 0.9852]).reshape([1, 1, 5, 5]), mstype.float32)
>>> _random_samples = Tensor(np.array([[[0.8, 0.8]]]), mstype.float32)
>>> net = nn.FractionalMaxPool2d(kernel_size=2, output_size=(2, 2), _random_samples=_random_samples,
...                              return_indices=True)
>>> y, argmax = net(input)
>>> y
[[[[0.9545 0.8764]
   [0.9673 0.9852]]]]
>>> argmax
[[[[ 1  9]
   [16 24]]]]
>>> net = nn.FractionalMaxPool2d(kernel_size=2, output_ratio=(0.5, 0.5), _random_samples=_random_samples,
...                              return_indices=True)
>>> y, argmax = net(input)
>>> print(y)
[[[[0.9545 0.8764]
   [0.9673 0.9852]]]]
>>> print(argmax)
[[[[ 1  9]
   [16 24]]]]

class tinyms.layers.FractionalMaxPool3d(kernel_size, output_size=None, output_ratio=None, return_indices=False, _random_samples=None)[source]¶

Applies the 3D FractionalMaxPool operatin over input. The output Tensor shape can be determined by either output_size or output_ratio, and the step size is determined by _random_samples. output_size or output_ratio cannot be used or set to None at the same time.

Refer to the paper Fractional MaxPooling by Ben Graham for more details.

The input and output data format can be “NCDHW”. N is the batch size, C is the number of channels, D the feature depth, H is the feature height, and W is the feature width.

Parameters:

kernel_size (Union[int, tuple[int]]) – The size of kernel used to take the maximum value, is a positive int that represents depth, height and width of the kernel, or a tuple of three positive integers that represent depth, height and width respectively.
output_size (Union[int, tuple[int]], optional) – The shape of the target output_size, is an int number that represents depth, height and width, or a tuple of three positive integers that represents depth, height and width respectively. If None, the shape of the target will be determined by output_ratio. Default: None.
output_ratio (Union[float, tuple[float]], optional) – The ratio of target output shape to input shape. Specifying the size of the output tensor by using a ratio of the input size. Data type : float16, float32, float64, and value is between (0, 1). If None, the shape of the target will be determined by output_size.Default: None.
return_indices (bool, optional) – Whether to return the indices of max value. Default: False.
_random_samples (Tensor, optional) – The random step of FractionalMaxPool2d, a Tensor of shape \((N, C, 3)\) whose elements are within the range of \((0, 1)\). Supported data type : float16, float32, float64. If None, no random step will be set. Default: None.

Inputs:

input (Tensor) - The input of FractionalMaxPool3d, which is a 4D or 5D tensor. Tensor of data type : float16, float32, float64, int32, int64. Supported shape \((N, C, D_{in}, H_{in}, W_{in})\) .

Outputs:

y (Tensor) - A tensor, the output of FractionalMaxPool3d. Has the same data type with imput_x. Tensor of shape \((N, C, D, H, W)\) .
argmax (Tensor) - The indices along with the outputs, which is a Tensor, with the same shape as the y and int32 data type. It will output only when return_indices is True.

Raises:

TypeError – If input is not a 4D or 5D tensor.
TypeError – If _random_samples is not a 3D tensor.
TypeError – If data type of imput_x is not float16, float32, float64, int32, int64.
TypeError – If dtype of _random_samples is not float16, float32, float64.
TypeError – If dtype of argmax is not int32, int64.
ValueError – If output_size is a tuple and if output_size length is not 3.
ValueError – If kernel_size is a tuple and if kernel_size length is not 3.
ValueError – If numbers in output_size or kernel_size is not positive.
ValueError – if output_size and output_ratio are None at the same time.
ValueError – If the first dimension size of input and _random_samples is not equal.
ValueError – If the second dimension size of input and _random_samples is not equal.
ValueError – If the third dimension size of _random_samples is not 3.

Supported Platforms:: GPU CPU

Examples

>>> import numpy as np
>>> from mindspore import nn
>>> from mindspore import Tensor
>>> import mindspore.common.dtype as mstype
>>> x = Tensor(np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16])
...            .reshape([1, 1, 2, 2, 4]), mstype.float32)
>>> _random_samples = Tensor(np.array([0.7, 0.7, 0.7]).reshape([1, 1, 3]), mstype.float32)
>>> net = nn.FractionalMaxPool3d(kernel_size=(1, 1, 1), output_size=(1, 1, 3),
...                              _random_samples=_random_samples, return_indices=True)
>>> output, argmax = net(x)
>>> print(output)
[[[[[13. 14. 16.]]]]]
>>> print(argmax)
[[[[[12 13 15]]]]]
>>> net = nn.FractionalMaxPool3d(kernel_size=(1, 1, 1), output_ratio=(0.5, 0.5, 0.5),
...                              _random_samples=_random_samples, return_indices=True)
>>> output, argmax = net(x)
>>> print(output)
[[[[[13. 16.]]]]]
>>> print(argmax)
[[[[[12 15]]]]]

class tinyms.layers.AdaptiveAvgPool1d(output_size)[source]¶

Applies a 1D adaptive average pooling over an input Tensor which can be regarded as a composition of 1D input planes.

Typically, the input is of shape \((N_{in}, C_{in}, L_{in})\), AdaptiveAvgPool1d outputs regional average in the \(L_{in}\)-dimension. The output is of shape \((N_{in}, C_{in}, L_{out})\), where \(L_{out}\) is defined by output_size.

Note

\(L_{in}\) must be divisible by output_size.

Parameters:: output_size (int) – the target output size \(L_{out}\).

Inputs:

input (Tensor) - Tensor of shape \((N, C_{in}, L_{in})\), with float16 or float32 data type.

Outputs:

Tensor of shape \((N, C_{in}, L_{out})\), has the same type as input.

Raises:

TypeError – If output_size is not an int.
TypeError – If input is neither float16 nor float32.
ValueError – If output_size is less than 1.
ValueError – If length of shape of input is not equal to 3.
ValueError – If the last dimension of input is smaller than output_size.
ValueError – If the last dimension of input is not divisible by output_size.

Supported Platforms:: Ascend GPU CPU

Examples

>>> import mindspore
>>> from mindspore import Tensor, nn
>>> import numpy as np
>>> pool = nn.AdaptiveAvgPool1d(output_size=2)
>>> input = Tensor(np.random.randint(0, 10, [1, 3, 6]), mindspore.float32)
>>> output = pool(input)
>>> result = output.shape
>>> print(result)
(1, 3, 2)

class tinyms.layers.AdaptiveMaxPool1d(output_size)[source]¶

Applies a 1D adaptive maximum pooling over an input Tensor which can be regarded as a composition of 1D input planes.

Typically, the input is of shape \((N_{in}, C_{in}, L_{in})\), AdaptiveMaxPool1d outputs regional maximum in the \(L_{in}\)-dimension. The output is of shape \((N_{in}, C_{in}, L_{out})\), where \(L_{out}\) is defined by output_size.

Note

\(L_{in}\) must be divisible by output_size.

Parameters:: output_size (int) – the target output size \(L_{out}\).

Inputs:

x (Tensor) - Tensor of shape \((N, C_{in}, L_{in})\), with float16 or float32 data type.

Outputs:

Tensor of shape \((N, C_{in}, L_{out})\), has the same type as x.

Raises:

TypeError – If x is neither float16 nor float32.
TypeError – If output_size is not an int.
ValueError – If output_size is less than 1.
ValueError – If the last dimension of x is smaller than output_size.
ValueError – If the last dimension of x is not divisible by output_size.
ValueError – If length of shape of x is not equal to 3.

Supported Platforms:: Ascend GPU CPU

Examples

>>> import mindspore
>>> from mindspore import Tensor, nn
>>> import numpy as np
>>> pool = nn.AdaptiveMaxPool1d(output_size=3)
>>> x = Tensor(np.random.randint(0, 10, [1, 3, 6]), mindspore.float32)
>>> output = pool(x)
>>> result = output.shape
>>> print(result)
(1, 3, 3)

class tinyms.layers.AdaptiveMaxPool2d(output_size, return_indices=False)[source]¶

This operator applies a 2D adaptive max pooling to an input signal composed of multiple input planes. That is, for any input size, the size of the specified output is H x W. The number of output features is equal to the number of input planes.

The input and output data format can be “NCHW” and “CHW”. N is the batch size, C is the number of channels, H is the feature height, and W is the feature width.

For max adaptive pool2d:

\[\begin{split}\begin{align} h_{start} &= floor(i * H_{in} / H_{out})\\ h_{end} &= ceil((i + 1) * H_{in} / H_{out})\\ w_{start} &= floor(j * W_{in} / W_{out})\\ w_{end} &= ceil((j + 1) * W_{in} / W_{out})\\ Output(i,j) &= {\max Input[h_{start}:h_{end}, w_{start}:w_{end}]} \end{align}\end{split}\]

Note

Ascend platform only supports float16 type for input.

Parameters:

output_size (Union[int, tuple]) – The target output size. ouput_size can be a tuple \((H, W)\), or an int H for \((H, H)\). \(H\) and \(W\) can be int or None. If it is None, it means the output size is the same as the input size.
return_indices (bool) – If return_indices is True, the indices of max value would be output. Default: False.

Inputs:

input (Tensor) - The input of AdaptiveMaxPool2d, which is a 3D or 4D tensor, with float16, float32 or float64 data type.

Outputs:

Tensor, with the same type as the input. Shape of the output is input_shape[:len(input_shape) - len(out_shape)] + out_shape.

Raises:

TypeError – If output_size is not int or tuple.
TypeError – If input is not a tensor.
TypeError – If return_indices is not a bool.
TypeError – If dtype of input is not float16, float32 or float64.
ValueError – If output_size is a tuple and the length of output_size is not 2.
ValueError – If the dimension of input is not NCHW or CHW.

Supported Platforms:: Ascend GPU CPU

Examples

>>> # case 1: output_size=(None, 2)
>>> input = Tensor(np.array([[[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]],
...                             [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]],
...                             [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]]]), mindspore.float32)
>>> adaptive_max_pool_2d = nn.AdaptiveMaxPool2d((None, 2))
>>> output = adaptive_max_pool_2d(input)
>>> print(output)
[[[[2. 3.]
   [5. 6.]
   [8. 9.]]
  [[2. 3.]
   [5. 6.]
   [8. 9.]]
  [[2. 3.]
   [5. 6.]
   [8. 9.]]]]
>>> # case 2: output_size=2
>>> adaptive_max_pool_2d = nn.AdaptiveMaxPool2d(2)
>>> output = adaptive_max_pool_2d(input)
>>> print(output)
[[[[5. 6.]
   [8. 9.]]
  [[5. 6.]
   [8. 9.]]
  [[5. 6.]
   [8. 9.]]]]
>>> # case 3: output_size=(1, 2)
>>> adaptive_max_pool_2d = nn.AdaptiveMaxPool2d((1, 2))
>>> output = adaptive_max_pool_2d(input)
>>> print(output)
[[[[8. 9.]]
  [[8. 9.]]
  [[8. 9.]]]]

class tinyms.layers.AdaptiveMaxPool3d(output_size, return_indices=False)[source]¶

Calculates the 3D adaptive max pooling for an input Tensor. That is, for any input size, the size of the specified output is \((D, H, W)\).

Parameters:

output_size (Union[int, tuple]) – The specified output size, which is a positive integer that represents depth, height and width, or a tuple of three positive integers that represent depth, height and width respectively. If it is None, the output size and input size of the corresponding dimension are the same.
return_indices (bool, optional) – If return_indices is True, the indices of max value would be output. Otherwise, the indices will not be returned. Default: False.

Inputs:

input (Tensor) - Tensor, has shape of \((C, D, H, W)\) or \((N, C, D, H, W)\).

Outputs:

y (Tensor) - Tensor, has the same number of dims and data type as the input .
argmax (Tensor) - Tensor, the indices of the maximum values along with the outputs, has the same shape as y and a dtype of int32. Return this only when return_indices is True.

Raises:

TypeError – If input is not a Tensor.
ValueError – If the dimensions number of input is not 4 or 5.
TypeError – If dtype of input is not int, uint or float.
ValueError – If output_size is neither an int nor a tuple with shape (3,).

Supported Platforms:: GPU CPU

Examples

>>> input = Tensor(np.arange(0,36).reshape((1, 3, 3, 4)).astype(np.float32))
>>> output_size = (1, 1, 2)
>>> net = nn.AdaptiveMaxPool3d(output_size, True)
>>> output = net(input)
>>> print(output[0].asnumpy())
[[[[33. 35.]]]]
>>> print(output[1].asnumpy())
[[[[33 35]]]]

class tinyms.layers.AdaptiveAvgPool2d(output_size)[source]¶

This operator applies a 2D adaptive average pooling to an input signal composed of multiple input planes. That is, for any input size, the size of the specified output is H x W. The number of output features is equal to the number of input features.

The input and output data format can be “NCHW” and “CHW”. N is the batch size, C is the number of channels, H is the feature height, and W is the feature width.

\[\begin{split}\begin{align} h_{start} &= floor(i * H_{in} / H_{out})\\ h_{end} &= ceil((i + 1) * H_{in} / H_{out})\\ w_{start} &= floor(j * W_{in} / W_{out})\\ w_{end} &= ceil((j + 1) * W_{in} / W_{out})\\ Output(i,j) &= \frac{\sum Input[h_{start}:h_{end}, w_{start}:w_{end}]}{(h_{end}- h_{start}) * (w_{end}- w_{start})} \end{align}\end{split}\]

Parameters:: output_size (Union[int, tuple]) – The target output size is H x W. ouput_size can be a tuple consisted of int type H and W, or a single H for H x H, or None. If it is None, it means the output size is the same as the input size.

Inputs:

input (Tensor) - The input of AdaptiveAvgPool2d, which is a 3D or 4D tensor, with float16, float32 or float64 data type.

Outputs:

Tensor of shape \((N, C_{out}, H_{out}, W_{out})\).

Raises:

ValueError – If output_size is a tuple and the length of output_size is not 2.
TypeError – If input is not a Tensor.
TypeError – If dtype of input is not float16, float32 or float64.
ValueError – If the dimension of input is less than or equal to the dimension of output_size.

Supported Platforms:: GPU

Examples

>>> pool = nn.AdaptiveAvgPool2d(2)
>>> input_x = Tensor(np.array([[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]],
...                            [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]],
...                            [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]]), mindspore.float32)
>>> output = pool(input_x)
>>> result = output.shape
>>> print(result)
(3, 2, 2)

class tinyms.layers.AdaptiveAvgPool3d(output_size)[source]¶

This operator applies a 3D adaptive average pooling to an input signal composed of multiple input planes. That is, for any input size, the size of the specified output is \((D, H, W)\). The number of output features is equal to the number of input planes.

Suppose the last 3 dimension size of input is \((inD, inH, inW)\), then the last 3 dimension size of output is \((outD, outH, outW)\).

\[\begin{split}\begin{array}{ll} \\ \forall \quad od \in [0,outD-1], oh \in [0,outH-1], ow \in [0,outW-1]\\ output[od,oh,ow] = \\ \qquad mean(input[istartD:iendD+1,istartH:iendH+1,istartW:iendW+1])\\ where,\\ \qquad istartD= \left\lceil \frac{od * inD}{outD} \right\rceil \\ \qquad iendD=\left\lfloor \frac{(od+1)* inD}{outD} \right\rfloor \\ \qquad istartH=\left\lceil \frac{oh * inH}{outH} \right\rceil \\ \qquad iendH=\left\lfloor \frac{(oh+1) * inH}{outH} \right\rfloor \\ \qquad istartW=\left\lceil \frac{ow * inW}{outW} \right\rceil \\ \qquad iendW=\left\lfloor \frac{(ow+1) * inW}{outW} \right\rfloor \end{array}\end{split}\]

Parameters:: output_size (Union[int, tuple]) – The target output size. ouput_size can be a tuple \((D, H, W)\), or an int D for \((D, D, D)\). \(D\), \(H\) and \(W\) can be int or None which means the output size is the same as that of the input.

Inputs:

input (Tensor) - The input of AdaptiveAvgPool3d, which is a 5D or 4D Tensor, with float16, float32 or float64 data type.

Outputs:

Tensor, with the same type as the input.

Raises:

TypeError – If input is not a Tensor.
TypeError – If dtype of input is not float16, float32 or float64.
ValueError – If the dimension of input is not 4D or 5D.
ValueError – If output_size value is not positive.

Supported Platforms:: Ascend GPU CPU

Examples

>>> # case 1: output_size=(3, 3, 4)
>>> output_size=(3, 3, 4)
>>> input_x_val = np.random.randn(4, 3, 5, 6, 7)
>>> input_x = Tensor(input_x_val, mindspore.float32)
>>> net = nn.AdaptiveAvgPool3d(output_size)
>>> output = net(input_x)
>>> print(output.shape)
(4, 3, 3, 3, 4)
>>> # case 2: output_size=4
>>> output_size=5
>>> input_x_val = np.random.randn(2, 3, 8, 6, 12)
>>> input_x = Tensor(input_x_val, mindspore.float32)
>>> net = nn.AdaptiveAvgPool3d(output_size)
>>> output = net(input_x)
>>> print(output.shape)
(2, 3, 5, 5, 5)
>>> # case 3: output_size=(None, 4, 5)
>>> output_size=(None, 4, 5)
>>> input_x_val = np.random.randn(4, 1, 9, 10, 8)
>>> input_x = Tensor(input_x_val, mindspore.float32)
>>> net = nn.AdaptiveAvgPool3d(output_size)
>>> output = net(input_x)
>>> print(output.shape)
(4, 1, 9, 4, 5)

class tinyms.layers.MaxUnpool1d(kernel_size, stride=None, padding=0)[source]¶

Computes the inverse of mindspore.nn.MaxPool1d.

MaxUnpool1d keeps the maximal value and set all position of non-maximal values to zero. Typically the input is of shape \((N, C, H_{in})\) or \((C, H_{in})\), and the output is of shape \((N, C, H_{out})\) or \((C, H_{out})\). The operation is as follows.

\[\begin{split}\begin{array}{ll} \\ H_{out} = (H{in} - 1) \times stride[0] - 2 \times padding[0] + kernel\_size[0] \\ \end{array}\end{split}\]

Parameters:

kernel_size (Union[int, tuple[int]]) – The size of kernel used to take the maximum value.
stride (Union[int, tuple[int]]) – The distance of kernel moving, If stride is None, then stride equal to kernel_size. Default: None.
padding (Union[int, tuple[int]]) – The pad value to be filled. Default: 0.

Inputs:

x (Tensor) - The input Tensor to invert. Tensor of shape \((N, C, H_{in})\) or \((C, H_{in})\).
indices (Tensor) - Max values’ index represented by the indices. Tensor of shape must be same with input ‘x’. Values of indices must belong to \([0, H_{in} - 1]\). Data type must be in int32 or int64.
output_size (tuple[int], optional) - The output size. Default: None. If output_size == (), then the shape of output computed by kernel_size, stride and padding. If output_size != (), then output_size must be \((N, C, H)\) , \((C, H)\) or \((H)\) and output_size must belong to \([(N, C, H_{out} - stride[0]), (N, C, H_{out} + stride[0])]\).

Outputs:

Tensor, with shape \((N, C, H_{out})\) or \((C, H_{out})\), with the same data type with x.

Raises:

TypeError – If data type of x or indices is not supported.
TypeError – If kernel_size, stride or padding is neither an int nor a tuple.
ValueError – If numbers in stride, padding (also support 0 and (0)) or kernel_size is not positive.
ValueError – If the shapes of x and indices are not equal.
ValueError – If x whose length is not 2 or 3.
ValueError – If type of output_size is not tuple.
ValueError – If output_size whose length is not 0, 2 or 3.
ValueError – If output_size is not close to output size computed by attr kernel_size, stride, padding.

Supported Platforms:: GPU CPU

Examples

>>> x = Tensor(np.array([[2, 4, 6, 8]]).astype(np.float32))
>>> indices = Tensor(np.array([[1, 3, 5, 7]]).astype(np.int64))
>>> maxunpool1d = nn.MaxUnpool1d(kernel_size =2, stride=2, padding=0)
>>> output = maxunpool1d(x, indices)
>>> print(output.asnumpy())
[[0. 2. 0. 4. 0. 6. 0. 8.]]

class tinyms.layers.MaxUnpool2d(kernel_size, stride=None, padding=0)[source]¶

Computes the inverse of mindspore.nn.MaxPool2d.

MaxUnpool2d keeps the maximal value and set all position of non-maximal values to zero. Typically the input is of shape \((N, C, H_{in}, W_{in})\) or \((C, H_{in}, W_{in})\), and the output is of shape \((N, C, H_{out}, W_{out})\) or \((C, H_{out}, W_{out})\). The operation is as follows.

\[\begin{split}\begin{array}{ll} \\ H_{out} = (H{in} - 1) \times stride[0] - 2 \times padding[0] + kernel\_size[0] \\ W_{out} = (W{in} - 1) \times stride[1] - 2 \times padding[1] + kernel\_size[1] \\ \end{array}\end{split}\]

Parameters:

kernel_size (Union[int, tuple[int]]) – The size of kernel used to take the maximum value, an int number that represents height and width of the kernel, or a tuple of two int numbers that represent height and width respectively.
stride (Union[int, tuple[int]]) – The distance of kernel moving, an int number that represents the height and width of movement are both stride, or a tuple of two int numbers that represent height and width of movement respectively. If stride is None, then stride equal to kernel_size. Default: None.
padding (Union[int, tuple[int]]) – The pad value to be filled. Default: 0. If padding is an integer, the paddings of height and width are the same, equal to padding. If padding is a tuple of two integers, the padding of height and width equal to padding[0] and padding[1] correspondingly.

Inputs:

x (Tensor) - The input Tensor to invert. Tensor of shape \((N, C, H_{in}, W_{in})\) or \((C, H_{in}, W_{in})\).
indices (Tensor) - Max values’ index represented by the indices. Tensor of shape must be same with input ‘x’. Values of indices must belong to \([0, H_{in} \times W_{in} - 1]\). Data type must be in int32 or int64.
output_size (tuple[int], optional) - The output size. Default: None. If output_size == (), then the shape of output computed by kernel_size, stride and padding. If output_size != (), then output_size must be \((N, C, H, W)\), \((C, H, W)\) or \((H, W)\) and output_size must belong to \([(N, C, H_{out} - stride[0], W_{out} - stride[1]), (N, C, H_{out} + stride[0], W_{out} + stride[1])]\).

Outputs:

Tensor, with shape \((N, C, H_{out}, W_{out})\) or \((C, H_{out}, W_{out})\), with the same data type with x.

Raises:

TypeError – If data type of x or indices is not supported.
TypeError – If kernel_size, stride or padding is neither an int nor a tuple.
ValueError – If numbers in stride, padding (also support 0 and (0, 0)) or kernel_size is not positive.
ValueError – If the shape of x and indices are not equal.
ValueError – If kernel_size, stride or padding is a tuple whose length is not equal to 2.
ValueError – If x whose length is not 3 or 4.
ValueError – If output_size whose type is not tuple.
ValueError – If output_size whose length is not 0, 3 or 4.
ValueError – If output_size is not close to output size computed by attr kernel_size, stride, padding.

Supported Platforms:: GPU CPU

Examples

>>> x = Tensor(np.array([[[[0, 1], [8, 9]]]]).astype(np.float32))
>>> indices = Tensor(np.array([[[[0, 1], [2, 3]]]]).astype(np.int64))
>>> maxunpool2d = nn.MaxUnpool2d(kernel_size=1, stride=1, padding=0)
>>> output = maxunpool2d(x, indices)
>>> print(output.asnumpy())
[[[[0. 1.]
   [8. 9.]]]]

class tinyms.layers.MaxUnpool3d(kernel_size, stride=None, padding=0)[source]¶

Computes the inverse of mindspore.nn.MaxPool3d.

MaxUnpool3d keeps the maximal value and set all position of non-maximal values to zero. Typically the input is of shape \((N, C, D_{in}, H_{in}, W_{in})\) or \((C, D_{in}, H_{in}, W_{in})\), and the output is of shape \((N, C, D_{out}, H_{out}, W_{out})\) or \((C, D_{out}, H_{out}, W_{out})\). The operation is as follows.

\[\begin{split}\begin{array}{ll} \\ D_{out} = (D{in} - 1) \times stride[0] - 2 \times padding[0] + kernel\_size[0] \\ H_{out} = (H{in} - 1) \times stride[1] - 2 \times padding[1] + kernel\_size[1] \\ W_{out} = (W{in} - 1) \times stride[2] - 2 \times padding[2] + kernel\_size[2] \\ \end{array}\end{split}\]

Parameters:

kernel_size (Union[int, tuple[int]]) – The size of kernel used to take the maximum value, an int number that represents depth, height and width of the kernel, or a tuple of three int numbers that represent depth, height and width respectively.
stride (Union[int, tuple[int]]) – The distance of kernel moving, an int number that represents the depth, height and width of movement are both stride, or a tuple of three int numbers that represent depth, height and width of movement respectively. If stride is None, then stride equal to kernel_size. Default: None.
padding (Union[int, tuple[int]]) – The pad value to be filled. Default: 0. If padding is an integer, the paddings of depth, height and width are the same, equal to padding. If padding is a tuple of three integers, the padding of depth, height and width equal to padding[0], padding[1] and padding[2] correspondingly.

Inputs:

x (Tensor) - The input Tensor to invert. Tensor of shape \((N, C, D_{in}, H_{in}, W_{in})\) or \((C, D_{in}, H_{in}, W_{in})\).
indices (Tensor) - Max values’ index represented by the indices. Tensor of shape must be same with input ‘x’. Values of indices must belong to \([0, D_{in} \times H_{in} \times W_{in} - 1]\). Data type must be in int32 or int64.
output_size (tuple[int], optional) - The output size. Default: None. If output_size == (), then the shape of output computed by kernel_size, stride and padding. If output_size != (), then output_size must be \((N, C, D, H, W)\) , \((C, D, H, W)\) or \((D, H, W)\) and output_size must belong to \([(N, C, D_{out} - stride[0], H_{out} - stride[1], W_{out} - stride[2]), (N, C, D_{out} + stride[0], H_{out} + stride[1], W_{out} + stride[2])]\).

Outputs:

Tensor, with shape \((N, C, D_{out}, H_{out}, W_{out})\) or \((C, D_{out}, H_{out}, W_{out})\), with the same data type with x.

Raises:

TypeError – If data type of x or indices is not supported.
TypeError – If kernel_size, stride or padding is neither an int nor a tuple.
ValueError – If numbers in stride or padding (also support 0 and (0, 0, 0)) or kernel_size is not positive.
ValueError – If the shape of x and indices are not equal.
ValueError – If kernel_size, stride or padding is a tuple whose length is not equal to 3.
ValueError – If x whose length is not 4 or 5.
ValueError – If output_size whose length is not 0, 4 or 5.
ValueError – If output_size whose type is not tuple.
ValueError – If output_size is not close to output size computed by attr kernel_size, stride, padding.

Supported Platforms:: GPU CPU

Examples

>>> x = Tensor(np.array([[[[[0, 1], [8, 9]]]]]).astype(np.float32))
>>> indices= Tensor(np.array([[[[[0, 1], [2, 3]]]]]).astype(np.int64))
>>> maxunpool3d = nn.MaxUnpool3d(kernel_size=1, stride=1, padding=0)
>>> output = maxunpool3d(x, indices)
>>> print(output.asnumpy())
[[[[[0. 1.]
    [8. 9.]]]]]

class tinyms.layers.LPPool1d(norm_type, kernel_size, stride=None, ceil_mode=False)[source]¶

Applying 1D LPPooling operation on an input Tensor can be regarded as forming a 1D input plane.

Typically the input is of shape \((N_{in}, C_{in}, L_{in})\) or \((C_{in}, L_{in})`\), the output is of shape \((N_{out}, C_{out}, L_{out})\) or \((C_{out}, L_{out})\), with the same shape as input, the operation is as follows.

\[f(X) = \sqrt[p]{\sum_{x \in X} x^{p}}\]

Parameters:

norm_type (Union[int, float]) –
Type of normalization, represents p in the formula, can not be 0.
- if p = 1, the result is the sum of the elements within the pooling kernel(proportional to average pooling).
- if p = \(\infty\), the result is the result of maximum pooling.
kernel_size (int) – The size of kernel window.
stride (int) – The distance of kernel moving, an int number that represents the width of movement is stride, if the value is None, the default value kernel_size is used;
ceil_mode (bool) – Whether to use ceil or floor to calculate output shape. Default: False.

Inputs:

x (Tensor) - Tensor of shape \((N_{in}, C_{in}, L_{in})\) or \((C_{in}, L_{in})\).

Outputs:

output (Tensor) - LPPool1d result, with shape \((N_{out}, C_{out}, L_{out})\) or \((C_{out}, L_{out})\), it has the same data type as x, where

\[L_{out} = \left\lfloor\frac{L_{in} - \text{kernel_size}}{\text{stride}} + 1\right\rfloor\]

Raises:

TypeError – If x is not an Tensor.
TypeError – If kernel_size or stride is not an int.
TypeError – If ceil_mode is not a bool.
TypeError – If norm_type is neither float nor int.
ValueError – If norm_type is equal to 0.
ValueError – If kernel_size or stride is less than 1.
ValueError – If length of shape of x is not equal to 2 or 3.

Supported Platforms:: Ascend GPU CPU

Examples

>>> import mindspore as ms
>>> import mindspore.nn as nn
>>> from mindspore import Tensor
>>> import numpy as np
>>> a = Tensor(np.arange(2 * 3 * 4).reshape((2, 3, 4)), dtype=ms.float32)
>>> net = nn.LPPool1d(norm_type=1, kernel_size=3, stride=1)
>>> out = net(a)
>>> print(out)
[[[ 3.  6.]
  [15. 18.]
  [27. 30.]]
 [[39. 42.]
  [51. 54.]
  [63. 66.]]]

class tinyms.layers.LPPool2d(norm_type, kernel_size, stride=None, ceil_mode=False)[source]¶

Applying 2D LPPooling operation on an input Tensor can be regarded as forming a 1D input plane.

Typically the input is of shape \((N, C, H_{in}, W_{in})\), the output is of shape \((N, C, H_{in}, W_{in})\), with the same shape as input, the operation is as follows.

\[f(X) = \sqrt[p]{\sum_{x \in X} x^{p}}\]

Parameters:

norm_type (Union[int, float]) –
- if p = 1, the result is the sum of the elements within the pooling kernel(proportional to average pooling).
- if p = \(\infty\), the result is the result of maximum pooling.
kernel_size (Union[int, tuple[int]]) – The size of kernel window. The data type of kernel_size must be int and the value represents the height and width, or a tuple of two int numbers that represent height and width respectively.
stride (Union[int, tuple[int]]) – The distance of kernel moving, an int number that represents the height and width of movement are both stride, or a tuple of two int numbers that represent height and width of movement respectively, if the value is None, the default value kernel_size is used;
ceil_mode (bool) – Whether to use ceil or floor to calculate output shape. Default: False.

Inputs:

x (Tensor) - Tensor of shape \((N, C, H_{in}, W_{in})\).

Outputs:

output (Tensor) - LPPool2d result, with shape \((N, C, H_{in}, W_{in})\), It has the same data type as x, where

\[H_{out} = \left\lfloor\frac{H_{in} - \text{kernel_size}[0]}{\text{stride}[0]} + 1\right\rfloor\]

\[W_{out} = \left\lfloor\frac{W_{in} - \text{kernel_size}[1]}{\text{stride}[1]} + 1\right\rfloor\]

Raises:

TypeError – If x is not an Tensor.
TypeError – If kernel_size or stride is neither int nor tuple.
TypeError – If ceil_mode is not a bool.
TypeError – If norm_type is neither float nor int.
ValueError – If norm_type is equal to 0.
ValueError – If kernel_size or stride is less than 1.
ValueError – If kernel_size or stride is a tuple whose length is not equal to 2.
ValueError – If length of shape of x is not equal to 4.

Supported Platforms:: Ascend GPU CPU

Examples

>>> import mindspore as ms
>>> import mindspore.nn as nn
>>> from mindspore import Tensor
>>> import numpy as np
>>> a = Tensor(np.arange(2 * 3 * 4 * 5).reshape((2, 3, 4, 5)), dtype=ms.float32)
>>> net = nn.LPPool2d(norm_type=1, kernel_size=3, stride=1)
>>> out = net(a)
>>> print(out)
[[[[  54.   63.   72.]
   [  99.  108.  117.]]
  [[ 234.  243.  252.]
   [ 279.  288.  297.]]
  [[ 414.  423.  432.]
   [ 459.  468.  477.]]]
 [[[ 594.  603.  612.]
   [ 639.  648.  657.]]
  [[ 774.  783.  792.]
   [ 819.  828.  837.]]
  [[ 954.  963.  972.]
   [ 999. 1008. 1017.]]]]

class tinyms.layers.ImageGradients[source]¶

Returns two tensors, the first is along the height dimension and the second is along the width dimension.

Assume an image shape is \(h*w\), the gradients along the height and the width are \(dy\) and \(dx\), respectively.

\[ \begin{align}\begin{aligned}dy[i] = \begin{cases} image[i+1, :]-image[i, :], &if\ 0<=i<h-1 \cr 0, &if\ i==h-1\end{cases}\\dx[i] = \begin{cases} image[:, i+1]-image[:, i], &if\ 0<=i<w-1 \cr 0, &if\ i==w-1\end{cases}\end{aligned}\end{align} \]

Inputs:

images (Tensor) - The input image data, with format ‘NCHW’.

Outputs:

dy (Tensor) - vertical image gradients, the same type and shape as input.
dx (Tensor) - horizontal image gradients, the same type and shape as input.

Raises:: ValueError – If length of shape of images is not equal to 4.

Supported Platforms:: Ascend GPU CPU

Examples

>>> net = nn.ImageGradients()
>>> image = Tensor(np.array([[[[1, 2], [3, 4]]]]), dtype=mindspore.int32)
>>> output = net(image)
>>> print(output)
(Tensor(shape=[1, 1, 2, 2], dtype=Int32, value=
[[[[2, 2],
   [0, 0]]]]), Tensor(shape=[1, 1, 2, 2], dtype=Int32, value=
[[[[1, 0],
   [1, 0]]]]))

class tinyms.layers.SSIM(max_val=1.0, filter_size=11, filter_sigma=1.5, k1=0.01, k2=0.03)[source]¶

Returns SSIM index between two images.

Its implementation is based on Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004) Image quality assessment: from error visibility to structural similarity .

SSIM is a measure of the similarity of two pictures. Like PSNR, SSIM is often used as an evaluation of image quality. SSIM is a number between 0 and 1, and the larger it is, the smaller the gap between the output image and the undistorted image, that is, the better the image quality. When the two images are exactly the same, SSIM=1.

\[\begin{split}l(x,y)&=\frac{2\mu_x\mu_y+C_1}{\mu_x^2+\mu_y^2+C_1}, C_1=(K_1L)^2.\\ c(x,y)&=\frac{2\sigma_x\sigma_y+C_2}{\sigma_x^2+\sigma_y^2+C_2}, C_2=(K_2L)^2.\\ s(x,y)&=\frac{\sigma_{xy}+C_3}{\sigma_x\sigma_y+C_3}, C_3=C_2/2.\\ SSIM(x,y)&=l*c*s\\&=\frac{(2\mu_x\mu_y+C_1)(2\sigma_{xy}+C_2)} {(\mu_x^2+\mu_y^2+C_1)(\sigma_x^2+\sigma_y^2+C_2)}.\end{split}\]

Parameters:

max_val (Union[int, float]) – The dynamic range of the pixel values (255 for 8-bit grayscale images). Default: 1.0.
filter_size (int) – The size of the Gaussian filter. Default: 11. The value must be greater than or equal to 1.
filter_sigma (float) – The standard deviation of Gaussian kernel. Default: 1.5. The value must be greater than 0.
k1 (float) – The constant used to generate c1 in the luminance comparison function. Default: 0.01.
k2 (float) – The constant used to generate c2 in the contrast comparison function. Default: 0.03.

Inputs:

img1 (Tensor) - The first image batch with format ‘NCHW’. It must be the same shape and dtype as img2.
img2 (Tensor) - The second image batch with format ‘NCHW’. It must be the same shape and dtype as img1.

Outputs:

Tensor, has the same dtype as img1. It is a 1-D tensor with shape N, where N is the batch num of img1.

Raises:

TypeError – If max_val is neither int nor float.
TypeError – If k1, k2 or filter_sigma is not a float.
TypeError – If filter_size is not an int.
ValueError – If max_val or filter_sigma is less than or equal to 0.
ValueError – If filter_size is less than 0.

Supported Platforms:: Ascend GPU CPU

Examples

>>> import numpy as np
>>> import mindspore.nn as nn
>>> from mindspore import Tensor
>>> net = nn.SSIM()
>>> img1 = Tensor(np.ones([1, 3, 16, 16]).astype(np.float32))
>>> img2 = Tensor(np.ones([1, 3, 16, 16]).astype(np.float32))
>>> output = net(img1, img2)
>>> print(output)
[1.]

class tinyms.layers.MSSSIM(max_val=1.0, power_factors=(0.0448, 0.2856, 0.3001, 0.2363, 0.1333), filter_size=11, filter_sigma=1.5, k1=0.01, k2=0.03)[source]¶

Returns MS-SSIM index between two images.

Its implementation is based on Multiscale structural similarity for image quality assessment by Zhou Wang, Eero P. Simoncelli, and Alan C. Bovik, published on Signals, Systems and Computers in 2004.

\[\begin{split}l(x,y)&=\frac{2\mu_x\mu_y+C_1}{\mu_x^2+\mu_y^2+C_1}, C_1=(K_1L)^2.\\ c(x,y)&=\frac{2\sigma_x\sigma_y+C_2}{\sigma_x^2+\sigma_y^2+C_2}, C_2=(K_2L)^2.\\ s(x,y)&=\frac{\sigma_{xy}+C_3}{\sigma_x\sigma_y+C_3}, C_3=C_2/2.\\ MSSSIM(x,y)&=l^\alpha_M*{\prod_{1\leq j\leq M} (c^\beta_j*s^\gamma_j)}.\end{split}\]

Parameters:

max_val (Union[int, float]) – The dynamic range of the pixel values (255 for 8-bit grayscale images). Default: 1.0.
power_factors (Union[tuple, list]) – Iterable of weights for each scale. Default: (0.0448, 0.2856, 0.3001, 0.2363, 0.1333). Default values obtained by Wang et al.
filter_size (int) – The size of the Gaussian filter. Default: 11.
filter_sigma (float) – The standard deviation of Gaussian kernel. Default: 1.5.
k1 (float) – The constant used to generate c1 in the luminance comparison function. Default: 0.01.
k2 (float) – The constant used to generate c2 in the contrast comparison function. Default: 0.03.

Inputs:

img1 (Tensor) - The first image batch with format ‘NCHW’. It must be the same shape and dtype as img2.
img2 (Tensor) - The second image batch with format ‘NCHW’. It must be the same shape and dtype as img1.

Outputs:

Tensor, the value is in range [0, 1]. It is a 1-D tensor with shape N, where N is the batch num of img1.

Raises:

TypeError – If max_val is neither int nor float.
TypeError – If power_factors is neither tuple nor list.
TypeError – If k1, k2 or filter_sigma is not a float.
TypeError – If filter_size is not an int.
ValueError – If max_val or filter_sigma is less than or equal to 0.
ValueError – If filter_size is less than 0.
ValueError – If length of shape of img1 or img2 is not equal to 4.

Supported Platforms:: Ascend GPU

Examples

>>> import numpy as np
>>> import mindspore.nn as nn
>>> from mindspore import Tensor
>>> net = nn.MSSSIM(power_factors=(0.033, 0.033, 0.033))
>>> img1 = Tensor(np.ones((1, 3, 128, 128)).astype(np.float32))
>>> img2 = Tensor(np.ones((1, 3, 128, 128)).astype(np.float32))
>>> output = net(img1, img2)
>>> print(output)
[1.]

class tinyms.layers.PSNR(max_val=1.0)[source]¶

Returns Peak Signal-to-Noise Ratio of two image batches.

It produces a PSNR value for each image in batch. Assume inputs are \(I\) and \(K\), both with shape \(h*w\). \(MAX\) represents the dynamic range of pixel values.

\[\begin{split}MSE&=\frac{1}{hw}\sum\limits_{i=0}^{h-1}\sum\limits_{j=0}^{w-1}[I(i,j)-K(i,j)]^2\\ PSNR&=10*log_{10}(\frac{MAX^2}{MSE})\end{split}\]

Parameters:: max_val (Union[int, float]) – The dynamic range of the pixel values (255 for 8-bit grayscale images). The value must be greater than 0. Default: 1.0.

Inputs:

img1 (Tensor) - The first image batch with format ‘NCHW’. It must be the same shape and dtype as img2.
img2 (Tensor) - The second image batch with format ‘NCHW’. It must be the same shape and dtype as img1.

Outputs:

Tensor, with dtype mindspore.float32. It is a 1-D tensor with shape N, where N is the batch num of img1.

Raises:

TypeError – If max_val is neither int nor float.
ValueError – If max_val is less than or equal to 0.
ValueError – If length of shape of img1 or img2 is not equal to 4.

Supported Platforms:: Ascend GPU CPU

Examples

>>> net = nn.PSNR()
>>> img1 = Tensor([[[[1, 2, 3, 4], [1, 2, 3, 4]]]])
>>> img2 = Tensor([[[[3, 4, 5, 6], [3, 4, 5, 6]]]])
>>> output = net(img1, img2)
>>> print(output)
[-6.0206]

class tinyms.layers.CentralCrop(central_fraction)[source]¶

Crops the central region of the images with the central_fraction.

Parameters:: central_fraction (float) – Fraction of size to crop. It must be float and in range (0.0, 1.0].

Inputs:

image (Tensor) - A 3-D tensor of shape [C, H, W], or a 4-D tensor of shape [N, C, H, W].

Outputs:

Tensor, 3-D or 4-D float tensor, according to the input.

Raises:

TypeError – If central_fraction is not a float.
ValueError – If central_fraction is not in range (0.0, 1.0].

Supported Platforms:: Ascend GPU CPU

Examples

>>> net = nn.CentralCrop(central_fraction=0.5)
>>> image = Tensor(np.random.random((4, 3, 4, 4)), mindspore.float32)
>>> output = net(image)
>>> print(output.shape)
(4, 3, 2, 2)

class tinyms.layers.PixelShuffle(upscale_factor)[source]¶

Applies the PixelShuffle operation over input which implements sub-pixel convolutions with stride \(1/r\) . For more details, refer to Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network .

Typically, the input is of shape \((*, C \times r^2, H, W)\) , and the output is of shape \((*, C, H \times r, W \times r)\), where r is an upscale factor and * is zero or more batch dimensions.

Note

The dimension of input Tensor on Ascend should be less than 7.

Parameters:: upscale_factor (int) – factor to shuffle the input, and is a positive integer. upscale_factor is the above-mentioned \(r\).

Inputs:

input (Tensor) - Tensor of shape \((*, C \times r^2, H, W)\) . The dimension of x is larger than 2, and the length of third to last dimension can be divisible by upscale_factor squared.

Outputs:

output (Tensor) - Tensor of shape \((*, C, H \times r, W \times r)\) .

Raises:

ValueError – If upscale_factor is not a positive integer.
ValueError – If the length of third to last dimension of input is not divisible by upscale_factor squared.
TypeError – If the dimension of input is less than 3.

Supported Platforms:: Ascend GPU CPU

Examples

>>> input_x = np.arange(3 * 2 * 8 * 4 * 4).reshape((3, 2, 8, 4, 4))
>>> input_x = mindspore.Tensor(input_x, mindspore.dtype.int32)
>>> pixel_shuffle = nn.PixelShuffle(2)
>>> output = pixel_shuffle(input_x)
>>> print(output.shape)
(3, 2, 2, 8, 8)

class tinyms.layers.PixelUnshuffle(downscale_factor)[source]¶

Applies the PixelUnshuffle operation over input which is the inverse of PixelShuffle. For more details, refer to Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network .

Typically, the input is of shape \((*, C, H \times r, W \times r)\) , and the output is of shape \((*, C \times r^2, H, W)\) , where r is a downscale factor and * is zero or more batch dimensions.

Parameters:: downscale_factor (int) – factor to unshuffle the input, and is a positive integer. downscale_factor is the above-mentioned \(r\).

Inputs:

input (Tensor) - Tensor of shape \((*, C, H \times r, W \times r)\) . The dimension of input is larger than 2, and the length of second to last dimension or last dimension can be divisible by downscale_factor .

Outputs:

output (Tensor) - Tensor of shape \((*, C \times r^2, H, W)\) .

Raises:

ValueError – If downscale_factor is not a positive integer.
ValueError – If the length of second to last dimension or last dimension is not divisible by downscale_factor .
TypeError – If the dimension of input is less than 3.

Supported Platforms:: Ascend GPU CPU

Examples

>>> pixel_unshuffle = nn.PixelUnshuffle(2)
>>> input_x = np.arange(8 * 8).reshape((1, 1, 8, 8))
>>> input_x = mindspore.Tensor(input_x, mindspore.dtype.int32)
>>> output = pixel_unshuffle(input_x)
>>> print(output.shape)
(1, 4, 4, 4)

class tinyms.layers.ReduceLogSumExp(axis, keep_dims=False)[source]¶

Reduces a dimension of a tensor by calculating exponential for all elements in the dimension, then calculate logarithm of the sum.

\[ReduceLogSumExp(x) = \log(\sum(e^x))\]

Parameters:

axis (Union[int, tuple(int), list(int)]) – (), reduce all dimensions. Only constant value is allowed.
keep_dims (bool) – If True, keep these reduced dimensions and the length is 1. If False, don’t keep these dimensions. Default : False.

Inputs:

x (Tensor) - The input tensor. With float16 or float32 data type.

Outputs:

Tensor, has the same dtype as the x.

If axis is (), and keep_dims is False, the output is a 0-D tensor representing the sum of all elements in the input tensor.
If axis is int, set as 2, and keep_dims is False, the shape of output is \((x_1, x_3, ..., x_R)\).
If axis is tuple(int), set as (2, 3), and keep_dims is False, the shape of output is \((x_1, x_4, ..., x_R)\).

Raises:

TypeError – If axis is not one of int, list, tuple.
TypeError – If keep_dims is not bool.
TypeError – If dtype of x is neither float16 nor float32.

Supported Platforms:: Ascend GPU CPU

Examples

>>> x = Tensor(np.random.randn(3, 4, 5, 6).astype(np.float32))
>>> op = nn.ReduceLogSumExp(1, keep_dims=True)
>>> output = op(x)
>>> print(output.shape)
(3, 1, 5, 6)

class tinyms.layers.Range(start, limit=None, delta=1)[source]¶: ‘nn.Range’ is deprecated from version 2.0 and will be removed in a future version, use ‘ops.range’ instead.

class tinyms.layers.LGamma[source]¶

Calculates LGamma using Lanczos’ approximation referring to “A Precision Approximation of the Gamma Function”. The algorithm is:

\[\begin{split}\begin{array}{ll} \\ lgamma(z + 1) = \frac{(\log(2) + \log(pi))}{2} + (z + 1/2) * log(t(z)) - t(z) + A(z) \\ t(z) = z + kLanczosGamma + 1/2 \\ A(z) = kBaseLanczosCoeff + \sum_{k=1}^n \frac{kLanczosCoefficients[i]}{z + k} \end{array}\end{split}\]

However, if the input is less than 0.5 use Euler’s reflection formula:

\[lgamma(x) = \log(pi) - lgamma(1-x) - \log(abs(sin(pi * x)))\]

And please note that

\[lgamma(+/-inf) = +inf\]

Thus, the behaviour of LGamma follows:

when x > 0.5, return log(Gamma(x))
when x < 0.5 and is not an integer, return the real part of Log(Gamma(x)) where Log is the complex logarithm
when x is an integer less or equal to 0, return +inf
when x = +/- inf, return +inf

Inputs:

x (Tensor) - The input tensor. Only float16, float32 are supported.

Outputs:

Tensor, has the same shape and dtype as the x.

Raises:: TypeError – If dtype of x is neither float16 nor float32.

Supported Platforms:: Ascend GPU

Examples

>>> x = Tensor(np.array([2, 3, 4]).astype(np.float32))
>>> op = nn.LGamma()
>>> output = op(x)
>>> print(output)
[3.5762787e-07 6.9314754e-01 1.7917603e+00]

class tinyms.layers.DiGamma[source]¶

Calculates Digamma using Lanczos’ approximation referring to “A Precision Approximation of the Gamma Function”. The algorithm is:

\[\begin{split}\begin{array}{ll} \\ digamma(z + 1) = log(t(z)) + A'(z) / A(z) - kLanczosGamma / t(z) \\ t(z) = z + kLanczosGamma + 1/2 \\ A(z) = kBaseLanczosCoeff + \sum_{k=1}^n \frac{kLanczosCoefficients[i]}{z + k} \\ A'(z) = \sum_{k=1}^n \frac{kLanczosCoefficients[i]}{{z + k}^2} \end{array}\end{split}\]

However, if the input is less than 0.5 use Euler’s reflection formula:

\[digamma(x) = digamma(1 - x) - pi * cot(pi * x)\]

Inputs:

x (Tensor[Number]) - The input tensor. Only float16, float32 are supported.

Outputs:

Tensor, has the same shape and dtype as the x.

Raises:: TypeError – If dtype of x is neither float16 nor float32.

Supported Platforms:: Ascend GPU

Examples

>>> x = Tensor(np.array([2, 3, 4]).astype(np.float32))
>>> op = nn.DiGamma()
>>> output = op(x)
>>> print(output)
[0.42278463  0.92278427 1.2561178]

class tinyms.layers.IGamma[source]¶

Calculates lower regularized incomplete Gamma function. The lower regularized incomplete Gamma function is defined as:

\[P(a, x) = gamma(a, x) / Gamma(a) = 1 - Q(a, x)\]

where

\[gamma(a, x) = \int_0^x t^{a-1} \exp^{-t} dt\]

is the lower incomplete Gamma function.

Above \(Q(a, x)\) is the upper regularized complete Gamma function.

Inputs:

a (Tensor) - The input tensor. With float32 data type. a should have the same dtype with x.
x (Tensor) - The input tensor. With float32 data type. x should have the same dtype with a.

Outputs:

Tensor, has the same dtype as a and x.

Raises:: TypeError – If dtype of input x and a is not float16 nor float32, or if x has different dtype with a.

Supported Platforms:: Ascend GPU CPU

Examples

>>> a = Tensor(np.array([2.0, 4.0, 6.0, 8.0]).astype(np.float32))
>>> x = Tensor(np.array([2.0, 3.0, 4.0, 5.0]).astype(np.float32))
>>> igamma = nn.IGamma()
>>> output = igamma(a, x)
>>> print (output)
[0.593994  0.35276785  0.21486944  0.13337152]

class tinyms.layers.LBeta[source]¶

This method avoids the numeric cancellation by explicitly decomposing lgamma into the Stirling approximation and an explicit log_gamma_correction, and cancelling the large terms from the Striling analytically.

This is semantically equal to

\[P(x, y) = lgamma(x) + lgamma(y) - lgamma(x + y).\]

The method is more accurate for arguments above 8. The reason for accuracy loss in the naive computation is catastrophic cancellation between the lgammas.

Inputs:

x (Tensor) - The input tensor. With float16 or float32 data type. x should have the same dtype with y.
y (Tensor) - The input tensor. With float16 or float32 data type. y should have the same dtype with x.

Outputs:

Tensor, has the same dtype as x and y.

Raises:: TypeError – If dtype of x or y is neither float16 nor float32, or if x has different dtype with y.

Supported Platforms:: Ascend GPU

Examples

>>> x = Tensor(np.array([2.0, 4.0, 6.0, 8.0]).astype(np.float32))
>>> y = Tensor(np.array([2.0, 3.0, 14.0, 15.0]).astype(np.float32))
>>> lbeta = nn.LBeta()
>>> output = lbeta(y, x)
>>> print(output)
[-1.7917596  -4.094345  -12.000229  -14.754799]

class tinyms.layers.CosineSimilarity(dim=1, eps=1e-08)[source]¶

Computes cosine similarity.

\[\mathcal{K} = \frac{\textbf{x}\textbf{y}^{\top}}{\parallel \textbf{x} \parallel \parallel \textbf{y} \parallel},\]

where \(\mathcal{K}\) is the similarity, \(\textbf{x}\) is the first tensor x1, \(\textbf{y}\) is the second tensor x2.

To avoid numerical errors when dividing by small numbers, the lower bound of \(\parallel \textbf{x} \parallel \parallel \textbf{y} \parallel\) is set to eps.

Parameters:

dim (int, optional) – Dimension. Default: 1.
eps (float, optional) – Small value. Default: 1e-08.

Inputs:

x1 (Tensor) - The first tensor \(\textbf{x}\). Shape: \((\ast_1, D, \ast_2)\) where \(D\) is at position dim.
x2 (Tensor) - The second tensor \(\textbf{y}\). The shape is the same as x1.

Outputs:

Tensor, with shape \((\ast_1, \ast_2)\), the data type will be inferred automatically.

Raises:: TypeError – If x1 or x2 is not a Tensor.

Supported Platforms:: Ascend GPU CPU

Examples

>>> x1 = Tensor([[1.0, 3.0, 4.0, 7.0], [2.0, 4.0, 2.0, 5.0], [3.0, 1.0, 5.0, 8.0]])
>>> x2 = Tensor([[2.0, 4.0, 2.0, 5.0], [3.0, 1.0, 5.0, 8.0], [1.0, 3.0, 4.0, 7.0]])
>>> func = nn.layer.CosineSimilarity()
>>> out = func(x1, x2)
>>> print(out.asnumpy())
[0.9402562 0.8614609 0.9516245]

class tinyms.layers.MatMul(transpose_x1=False, transpose_x2=False)[source]¶

The nn.MatMul interface is deprecated, please use the mindspore.ops.matmul instead.

Supported Platforms:: deprecated

class tinyms.layers.Moments(axis=None, keep_dims=None)[source]¶: ‘nn.Moments’ is deprecated from version 2.0 and will be removed in a future version, use ‘ops.var_mean’ instead.

class tinyms.layers.MatInverse[source]¶

Calculates the inverse of Positive-Definite Hermitian matrix using Cholesky decomposition.

Inputs:

x (Tensor[Number]) - The input tensor. It must be a positive-definite matrix. With float16 or float32 data type.

Outputs:

Tensor, has the same dtype as the x.

Raises:: TypeError – If dtype of x is neither float16 nor float32.

Supported Platforms:: GPU

Examples

>>> x = Tensor(np.array([[4, 12, -16], [12, 37, -43], [-16, -43, 98]]).astype(np.float32))
>>> op = nn.MatInverse()
>>> output = op(x)
>>> print(output)
[[49.36112  -13.555558  2.1111116]
 [-13.555558  3.7777784  -0.5555557]
 [2.1111116  -0.5555557  0.11111113]]

class tinyms.layers.MatDet[source]¶

Calculates the determinant of Positive-Definite Hermitian matrix using Cholesky decomposition.

Inputs:

x (Tensor[Number]) - The input tensor. It must be a positive-definite matrix. With float16 or float32 data type.

Outputs:

Tensor, has the same dtype as the x.

Raises:: TypeError – If dtype of x is neither float16 nor float32.

Supported Platforms:: GPU

Examples

>>> x = Tensor(np.array([[4, 12, -16], [12, 37, -43], [-16, -43, 98]]).astype(np.float32))
>>> op = nn.MatDet()
>>> output = op(x)
>>> print(output)
35.999996

class tinyms.layers.Conv2dBnAct(in_channels, out_channels, kernel_size, stride=1, pad_mode='same', padding=0, dilation=1, group=1, has_bias=False, weight_init='normal', bias_init='zeros', has_bn=False, momentum=0.997, eps=1e-05, activation=None, alpha=0.2, after_fake=True)[source]¶

A combination of convolution, Batchnorm, and activation layer.

This part is a more detailed overview of Conv2d operation.

Parameters:

in_channels (int) – The number of input channel \(C_{in}\).
out_channels (int) – The number of output channel \(C_{out}\).
kernel_size (Union[int, tuple]) – The data type is int or a tuple of 2 integers. Specifies the height and width of the 2D convolution window. Single int means the value is for both height and width of the kernel. A tuple of 2 ints means the first value is for the height and the other is for the width of the kernel.
stride (int) – Specifies stride for all spatial dimensions with the same value. The value of stride must be greater than or equal to 1 and lower than any one of the height and width of the x. Default: 1.
pad_mode (str) – Specifies padding mode. The optional values are “same”, “valid”, “pad”. Default: “same”.
padding (int) – Implicit paddings on both sides of the x. Default: 0.
dilation (int) – Specifies the dilation rate to use for dilated convolution. If set to be \(k > 1\), there will be \(k - 1\) pixels skipped for each sampling location. Its value must be greater than or equal to 1 and lower than any one of the height and width of the x. Default: 1.
group (int) – Splits filter into groups, in_channels and out_channels must be divisible by the number of groups. Default: 1.
has_bias (bool) – Specifies whether the layer uses a bias vector. Default: False.
weight_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the convolution kernel. It can be a Tensor, a string, an Initializer or a number. When a string is specified, values from ‘TruncatedNormal’, ‘Normal’, ‘Uniform’, ‘HeUniform’ and ‘XavierUniform’ distributions as well as constant ‘One’ and ‘Zero’ distributions are possible. Alias ‘xavier_uniform’, ‘he_uniform’, ‘ones’ and ‘zeros’ are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of Initializer for more details. Default: ‘normal’.
bias_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the bias vector. Possible Initializer and string are the same as ‘weight_init’. Refer to the values of Initializer for more details. Default: ‘zeros’.
has_bn (bool) – Specifies to used batchnorm or not. Default: False.
momentum (float) – Momentum for moving average for batchnorm, must be [0, 1]. Default:0.997
eps (float) – Term added to the denominator to improve numerical stability for batchnorm, should be greater than 0. Default: 1e-5.
activation (Union[str, Cell, Primitive]) – Specifies activation type. The optional values are as following: ‘softmax’, ‘logsoftmax’, ‘relu’, ‘relu6’, ‘tanh’, ‘gelu’, ‘sigmoid’, ‘prelu’, ‘leakyrelu’, ‘hswish’, ‘hsigmoid’. Default: None.
alpha (float) – Slope of the activation function at x < 0 for LeakyReLU. Default: 0.2.
after_fake (bool) – Determine whether there must be a fake quantization operation after Cond2dBnAct. Default: True.

Inputs:

x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\). The data type is float32.

Outputs:

Tensor of shape \((N, C_{out}, H_{out}, W_{out})\). The data type is float32.

Raises:

TypeError – If in_channels, out_channels, stride, padding or dilation is not an int.
TypeError – If has_bias is not a bool.
ValueError – If in_channels or out_channels stride, padding or dilation is less than 1.
ValueError – If pad_mode is not one of ‘same’, ‘valid’, ‘pad’.

Supported Platforms:: Ascend GPU CPU

Examples

>>> net = nn.Conv2dBnAct(120, 240, 4, has_bn=True, activation='relu')
>>> x = Tensor(np.ones([1, 120, 1024, 640]), mindspore.float32)
>>> result = net(x)
>>> output = result.shape
>>> print(output)
(1, 240, 1024, 640)

class tinyms.layers.DenseBnAct(in_channels, out_channels, weight_init='normal', bias_init='zeros', has_bias=True, has_bn=False, momentum=0.9, eps=1e-05, activation=None, alpha=0.2, after_fake=True)[source]¶

A combination of Dense, Batchnorm, and the activation layer.

This part is a more detailed overview of Dense op.

Parameters:

in_channels (int) – The number of channels in the input space.
out_channels (int) – The number of channels in the output space.
weight_init (Union[Tensor, str, Initializer, numbers.Number]) – The trainable weight_init parameter. The dtype is same as x. The values of str refer to the function initializer. Default: ‘normal’.
bias_init (Union[Tensor, str, Initializer, numbers.Number]) – The trainable bias_init parameter. The dtype is same as x. The values of str refer to the function initializer. Default: ‘zeros’.
has_bias (bool) – Specifies whether the layer uses a bias vector. Default: True.
has_bn (bool) – Specifies to use batchnorm or not. Default: False.
momentum (float) – Momentum for moving average for batchnorm, must be [0, 1]. Default:0.9
eps (float) – Term added to the denominator to improve numerical stability for batchnorm, should be greater than 0. Default: 1e-5.
activation (Union[str, Cell, Primitive]) – Specifies activation type. The optional values are as following: ‘softmax’, ‘logsoftmax’, ‘relu’, ‘relu6’, ‘tanh’, ‘gelu’, ‘sigmoid’, ‘prelu’, ‘leakyrelu’, ‘hswish’, ‘hsigmoid’. Default: None.
alpha (float) – Slope of the activation function at x < 0 for LeakyReLU. Default: 0.2.
after_fake (bool) – Determine whether there must be a fake quantization operation after DenseBnAct. Default: True.

Inputs:

x (Tensor) - Tensor of shape \((N, in\_channels)\). The data type is float32.

Outputs:

Tensor of shape \((N, out\_channels)\). The data type is float32.

Raises:

TypeError – If in_channels or out_channels is not an int.
TypeError – If has_bias, has_bn or after_fake is not a bool.
TypeError – If momentum or eps is not a float.
ValueError – If momentum is not in range [0, 1.0].

Supported Platforms:: Ascend GPU CPU

Examples

>>> net = nn.DenseBnAct(3, 4)
>>> x = Tensor(np.random.randint(0, 255, [2, 3]), mindspore.float32)
>>> result = net(x)
>>> output = result.shape
>>> print(output)
(2, 4)

class tinyms.layers.TimeDistributed(layer, time_axis, reshape_with_axis=None)[source]

The time distributed layer.

Time distributed is a wrapper which allows to apply a layer to every temporal slice of an input. And the x should be at least 3D. There are two cases in the implementation. When reshape_with_axis provided, the reshape method will be chosen, which is more efficient; otherwise, the method of dividing the inputs along time axis will be used, which is more general. For example, reshape_with_axis could not be provided when deal with Batch Normalization.

Parameters:

layer (Union[Cell, Primitive]) – The Cell or Primitive which will be wrapped.
time_axis (int) – The axis of time_step.
reshape_with_axis (int) – The axis which will be reshaped with time_axis. Default: None.

Inputs:

x (Tensor) - Tensor of shape \((N, T, *)\), where \(*\) means any number of additional dimensions.

Outputs:

Tensor of shape \((N, T, *)\)

Raises:: TypeError – If layer is not a Cell or Primitive.

Supported Platforms:: Ascend GPU CPU

Examples

>>> x = Tensor(np.random.random([32, 10, 3]), mindspore.float32)
>>> dense = nn.Dense(3, 6)
>>> net = nn.TimeDistributed(dense, time_axis=1, reshape_with_axis=0)
>>> output = net(x)
>>> print(output.shape)
(32, 10, 6)

class tinyms.layers.MultiheadAttention(embed_dim, num_heads, dropout=0.0, has_bias=True, add_bias_kv=False, add_zero_attn=False, kdim=None, vdim=None, batch_first=False)[source]¶

This is an implementation of multihead attention in the paper Attention is all you need. Given the query vector with source length, and the key and value vector with target length, the attention will be performed as the following

\[MultiHeadAttention(query, key, vector) = Concat(head_1, \dots, head_h)W^O\]

where \(head_i = Attention(QW_i^Q, KW_i^K, VW_i^V)\). The default is with a bias.

if query, key and value tensor is same, then it will be self attention.

Parameters:

embed_dim (int) – Total dimension of MultiheadAttention.
num_heads (int) – Number of attention heads. Note that embed_dim will be split across num_heads (i.e. each head will have dimension embed_dim // num_heads).
dropout (float) – Dropout probability of attn_output_weights. Default: 0.0.
has_bias (bool) – Whether adds bias to input / output projection layers. Default: True.
add_bias_kv (bool) – Whether adds bias to the key and value sequences at axis=0. Default: False.
add_zero_attn (bool) – Whether adds a new batch of zeros to the key and value sequences at axis=1. Default: False.
kdim (int) – Total number of features for keys. Default: None (kdim=embed_dim).
vdim (int) – Total number of features for values. Default: None (vdim=embed_dim).
batch_first (bool) – If True, then the input and output shape are \((batch, seq, feature)\) , else \((seq, batch, feature)\) . Default: False.

Inputs:

query (Tensor): The query embeddings. If query is unbatched, the shape is \((L, E_q)\), otherwise the shape is \((L, N, E_q)\) when batch_first=False or \((N, L, E_q)\) when batch_first=True, where \(L`is the target sequence length, :math:`N\) is the batch size, and \(E_q\) is the query embedding dimension embed_dim. Queries are compared against key-value pairs to produce the output. See “Attention Is All You Need” for more details.
key (Tensor): The key embeddings. If key is unbatched, the shape is \((S, E_k)\), otherwise the shape is \((S, N, E_k)\) when batch_first=False or \((N, S, E_k)\) when batch_first=True, where \(S\) is the source sequence length, \(N\) is the batch size, and \(E_k\) is the key embedding dimension kdim. See “Attention Is All You Need” for more details.
value (Tensor): The value embeddings. If value is unbatched, the shape is \((S, E_v)\), otherwise the shape is \((S, N, E_v)\) when batch_first=False or \((N, S, E_v)\) when batch_first=True, where \(S\) is the source sequence length, \(N\) is the batch size, and \(E_v\) is the value embedding dimension vdim. See “Attention Is All You Need” for more details.
key_padding_mask (Tensor, optional): If specified, a mask of shape \((N, S)\) indicating which elements within key to ignore for the purpose of attention (i.e. treat as “padding”). For unbatched query, shape should be \((S)\). Binary and byte masks are supported. For a binary mask, a True value indicates that the corresponding key value will be ignored for the purpose of attention. For a float mask, it will be directly added to the corresponding key value.
need_weights (bool): Whether returns attn_output_weights in addition to attn_outputs. Default: True.
attn_mask (Tensor, optional): If specified, a 2D or 3D mask preventing attention to certain positions. Must be of shape \((L, S)\) or \((N\cdot\text{num\_heads}, L, S)\), where \(N\) is the batch size, \(L\) is the target sequence length, and \(S\) is the source sequence length. A 2D mask will be broadcasted across the batch while a 3D mask allows for a different mask for each entry in the batch. Binary, byte, and float masks are supported. For a binary mask, a True value indicates that the corresponding position is not allowed to attend. For a byte mask, a non-zero value indicates that the corresponding position is not allowed to attend. For a float mask, the mask values will be added to the attention weight.
average_attn_weights (bool): If true, indicates that the returned attn_weights should be averaged across heads. Otherwise, attn_weights are provided separately per head. Note that this flag only has an effect when need_weights=True. Default: True (i.e. average weights across heads)

Outputs:

Tuple, a tuple contains(attn_output, attn_output_weights)

attn_output - Attention outputs. If input is unbatched, the output shape is \((L, E)\), otherwise the output shape is \((L, N, E)\) when batch_first=False or \((N, L, E)\) when batch_first=True, where \(L\) is the target sequence length, \(N\) is the batch size, and \(E\) is the embedding dimension embed_dim.
attn_output_weights - Only returned when need_weights=True. If average_attn_weights=True, returns attention weights averaged across heads with shape \((L, S)\) when input is unbatched or \((N, L, S)\) when input is batched, where \(N\) is the batch size, \(L\) is the target sequence length, and \(S\) is the source sequence length. If average_attn_weights=False, returns attention weights per head of shape \((\text{num\_heads}, L, S)\) when input is unbatched or \((N, \text{num\_heads}, L, S)\) when input is batched.

Supported Platforms:

Ascend GPU CPU

Examples

>>> embed_dim, num_heads = 128, 8
>>> seq_length, batch_size = 10, 8
>>> query = Tensor(np.random.randn(seq_length, batch_size, embed_dim), mindspore.float32)
>>> key = Tensor(np.random.randn(seq_length, batch_size, embed_dim), mindspore.float32)
>>> value = Tensor(np.random.randn(seq_length, batch_size, embed_dim), mindspore.float32)
>>> multihead_attn = nn.MultiheadAttention(embed_dim, num_heads)
>>> attn_output, attn_output_weights = multihead_attn(query, key, value)
>>> print(attn_output.shape)
(10, 8, 128)

class tinyms.layers.TransformerEncoderLayer(d_model: int, nhead: int, dim_feedforward: int = 2048, dropout: float = 0.1, activation: Union[str, mindspore.nn.cell.Cell, callable] = 'relu', layer_norm_eps: float = 1e-05, batch_first: bool = False, norm_first: bool = False)[source]¶

Transformer Encoder Layer. This is an implementation of the single layer of the transformer encoder layer, including multihead attention and feedward layer.

Parameters:

d_model (int) – The number of features in the input tensor.
nhead (int) – The number of heads in the MultiheadAttention modules.
dim_feedforward (int) – The dimension of the feedforward layer. Default: 2048.
dropout (float) – The dropout value. Default: 0.1.
activation (Union[str, callable, Cell]) – The activation function of the intermediate layer, can be a string (“relu” or “gelu”), Cell instance (nn.ReLU() or nn.GELU()) or a callable (ops.relu or ops.gelu). Default: "relu".
layer_norm_eps (float) – The epsilon value in LayerNorm modules. Default: 1e-5.
batch_first (bool) –

If batch_first = True, then the shape of input and output tensors is
\((batch, seq, feature)\) , otherwise the shape is \((seq, batch, feature)\) .

Default: False.
norm_first (bool) – If norm_first = True, layer norm is done prior to attention and feedforward operations, respectively. Default: False.

Inputs:

src (Tensor): the sequence to the encoder layer.
src_mask (Tensor, optional): the mask for the src sequence. Default: None.
src_key_padding_mask (Tensor, optional): the mask for the src keys per batch. Default: None.

Outputs:

Tensor.

Supported Platforms:

Ascend GPU CPU

Examples

>>> encoder_layer = nn.TransformerEncoderLayer(d_model=512, nhead=8)
>>> src = Tensor(np.random.rand(10, 32, 512), mindspore.float32)
>>> out = encoder_layer(src)
>>> # Alternatively, when batch_first=True:
>>> encoder_layer = nn.TransformerEncoderLayer(d_model=512, nhead=8, batch_first=True)
>>> src = Tensor(np.random.rand(32, 10, 512), mindspore.float32)
>>> out = encoder_layer(src)
>>> print(out.shape)
(32, 10, 512)

class tinyms.layers.TransformerDecoderLayer(d_model: int, nhead: int, dim_feedforward: int = 2048, dropout: float = 0.1, activation: Union[str, mindspore.nn.cell.Cell, callable] = 'relu', layer_norm_eps: float = 1e-05, batch_first: bool = False, norm_first: bool = False)[source]¶

Transformer Decoder Layer. This is an implementation of the single layer of the transformer decoder layer, including self-attention, cross attention and feedward layer.

Parameters:

d_model (int) – The number of expected features in the input tensor.
nhead (int) – The number of heads in the MultiheadAttention modules.
dim_feedforward (int) – The dimension of the feedforward layer. Default: 2048.
dropout (float) – The dropout value. Default: 0.1.
activation (Union[str, callable, Cell]) – The activation function of the intermediate layer, can be a string (“relu” or “gelu”), Cell instance (nn.ReLU() or nn.GELU()) or a callable (ops.relu or ops.gelu). Default: "relu"
layer_norm_eps (float) – The epsilon value in LayerNorm modules. Default: 1e-5.
batch_first (bool) – If batch_first = True, then the shape of input and output tensors is \((batch, seq, feature)\) , otherwise the shape is \((seq, batch, feature)\). Default: False.
norm_first (bool) – If norm_first = True, layer norm is done prior to attention and feedforward operations, respectively. Default: False.

Inputs:

tgt (Tensor): The sequence to the decoder layer.
memory (Tensor): The sequence from the last layer of the encoder.
tgt_mask (Tensor, optional): The mask of the tgt sequence. Default: None.
memory_mask (Tensor, optional): The mask of the memory sequence. Default: None.
tgt_key_padding_mask (Tensor, optional): The mask of the tgt keys per batch. Default: None.
memory_key_padding_mask (Tensor, optional): The mask of the memory keys per batch. Default: None.

Outputs:

Tensor.

Supported Platforms:

Ascend GPU CPU

Examples

>>> decoder_layer = nn.TransformerDecoderLayer(d_model=512, nhead=8)
>>> memory = Tensor(np.random.rand(10, 32, 512), mindspore.float32)
>>> tgt = Tensor(np.random.rand(20, 32, 512), mindspore.float32)
>>> out = decoder_layer(tgt, memory)
>>> # Alternatively, when `batch_first` is ``True``:
>>> decoder_layer = nn.TransformerDecoderLayer(d_model=512, nhead=8, batch_first=True)
>>> memory = Tensor(np.random.rand(32, 10, 512), mindspore.float32)
>>> tgt = Tensor(np.random.rand(32, 20, 512), mindspore.float32)
>>> out = decoder_layer(tgt, memory)
>>> print(out.shape)
(32, 20, 512)

class tinyms.layers.TransformerEncoder(encoder_layer, num_layers, norm=None)[source]¶

Transformer Encoder module with multi-layer stacked of TransformerEncoderLayer, including multihead self attention and feedforward layer. Users can build the BERT(https://arxiv.org/abs/1810.04805) model with corresponding parameters.

Parameters:

encoder_layer (Cell) – An instance of the TransformerEncoderLayer() class.
num_layers (int) – The number of encoder-layers in the encoder.
norm (Cell, optional) – The layer normalization module.

Inputs:

src (Tensor): The sequence to the encoder.
src_mask (Tensor, optional): The mask of the src sequence. Default: None.
src_key_padding_mask (Tensor, optional): the mask of the src keys per batch . Default: None.

Outputs:

Tensor.

Supported Platforms:

Ascend GPU CPU

Examples

>>> encoder_layer = nn.TransformerEncoderLayer(d_model=512, nhead=8)
>>> transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=6)
>>> src = Tensor(np.random.rand(10, 32, 512), mindspore.float32)
>>> out = transformer_encoder(src)
>>> print(out.shape)
(10, 32, 512)

class tinyms.layers.TransformerDecoder(decoder_layer, num_layers, norm=None)[source]¶

Transformer Decoder module with multi-layer stacked of TransformerDecoderLayer, including multihead self attention, cross attention and feedforward layer.

Parameters:

decoder_layer (Cell) – An instance of the mindspore.nn.TransformerDecoderLayer class.
num_layers (int) – The number of decoder-layers in the decoder.
norm (Cell, optional) – The layer normalization module.

Inputs:

tgt (Tensor): The sequence to the decoder.
memory (Tensor): The sequence from the last layer of the encoder.
tgt_mask (Tensor, optional): the mask of the tgt sequence. Default: None.
memory_mask (Tensor, optional): the mask of the memory sequence. Default: None.
tgt_key_padding_mask (Tensor, optional): the mask of the tgt keys per batch. Default: None.
memory_key_padding_mask (Tensor, optional): the mask of the memory keys per batch. Default: None.

Outputs:

Tensor.

Supported Platforms:

Ascend GPU CPU

Examples

>>> decoder_layer = nn.TransformerDecoderLayer(d_model=512, nhead=8)
>>> transformer_decoder = nn.TransformerDecoder(decoder_layer, num_layers=6)
>>> memory = Tensor(np.random.rand(10, 32, 512), mindspore.float32)
>>> tgt = Tensor(np.random.rand(20, 32, 512), mindspore.float32)
>>> out = transformer_decoder(tgt, memory)
>>> print(out.shape)
(20, 32, 512)

class tinyms.layers.Transformer(d_model: int = 512, nhead: int = 8, num_encoder_layers: int = 6, num_decoder_layers: int = 6, dim_feedforward: int = 2048, dropout: float = 0.1, activation: Union[str, mindspore.nn.cell.Cell, callable] = 'relu', custom_encoder: Optional[mindspore.nn.cell.Cell] = None, custom_decoder: Optional[mindspore.nn.cell.Cell] = None, layer_norm_eps: float = 1e-05, batch_first: bool = False, norm_first: bool = False)[source]¶

Transformer module including encoder and decoder. The difference with the original implements is the module use the residual addition before the layer normalization. And the default hidden act is gelu. The details can be found in Attention is all you need.

Parameters:

d_model (int) – The number of expected features in the inputs tensor. Default: 512.
nhead (int) – The number of heads in the MultiheadAttention modules. Default: 8.
num_encoder_layers (int) – The number of encoder-layers in the encoder. Default: 6.
num_decoder_layers (int) – The number of decoder-layers in the decoder. Default: 6.
dim_feedforward (int) – The dimension of the feedforward layer. Default: 2048.
dropout (float) – The dropout value. Default: 0.1.
activation (Union[str, callable, Cell]) – The activation function of the intermediate layer, can be a string (“relu” or “gelu”), Cell instance (nn.ReLU() or nn.GELU()) or a callable (ops.relu or ops.gelu). Default: "relu"
custom_encoder (Cell) – Custom encoder. Default: None.
custom_decoder (Cell) – Custom decoder. Default: None.
layer_norm_eps (float) – the epsilion value in layer normalization module. Default: 1e-5.
batch_first (bool) – If batch_first = True, then the shape of input and output tensors is \((batch, seq, feature)\) , otherwise the shape is \((seq, batch, feature)\) . Default: False.
norm_first (bool) – If norm_first = True, layer norm is done prior to attention and feedforward operations, respectively. Default: False.

Inputs:

src (Tensor): The source sequence to the encoder.
tgt (Tensor): The target sequence to the decoder.
src_mask (Tensor, optional): The mask of the src sequence. Default: None.
tgt_mask (Tensor, optional): The mask of the tgt sequence. Default: None.
memory_mask (Tensor, optional): The additive mask of the encoder output. Default: None.
src_key_padding_mask (Tensor, optional): The mask of src keys per batch. Default: None.
tgt_key_padding_mask (Tensor, optional): The mask of tgt keys per batch. Default: None.
memory_key_padding_mask (Tensor, optional): The mask of memory keys per batch. Default: None.

Outputs:

Tensor.

Supported Platforms:

Ascend GPU CPU

Examples

>>> transformer_model = nn.Transformer(nhead=16, num_encoder_layers=12)
>>> src = Tensor(np.random.rand(10, 32, 512), mindspore.float32)
>>> tgt = Tensor(np.random.rand(20, 32, 512), mindspore.float32)
>>> out = transformer_model(src, tgt)
>>> print(out.shape)
(20, 32, 512)

class tinyms.layers.DenseThor(in_channels, out_channels, weight_init='normal', bias_init='zeros', has_bias=True, activation=None)[source]¶

The dense connected layer and saving the information needed for THOR.

Applies dense connected layer for the input and saves the information A and G in the dense connected layer needed for THOR.

This layer implements the operation as:

\[\text{outputs} = \text{activation}(\text{inputs} * \text{kernel} + \text{bias}),\]

where \(\text{activation}\) is the activation function , \(\text{kernel}\) is a weight matrix with the same data type as the inputs created by the layer, and \(\text{bias}\) is a bias vector with the same data type as the inputs created by the layer (only if has_bias is True).

Parameters:

in_channels (int) – The number of the input channels.
out_channels (int) – The number of the output channels.
weight_init (Union[Tensor, str, Initializer, numbers.Number]) – The trainable weight_init parameter. The dtype is same as x. The values of str refer to the function initializer. Default: ‘normal’.
bias_init (Union[Tensor, str, Initializer, numbers.Number]) – The trainable bias_init parameter. The dtype is same as x. The values of str refer to the function initializer. Default: ‘zeros’.
has_bias (bool) – Specifies whether the layer uses a bias vector. Default: True.
activation (str) – activate function applied to the output of the fully connected layer, eg. ‘ReLU’. Default: None.

Inputs:

x (Tensor) - Tensor of shape \((N, in\_channels)\).

Outputs:

Tensor of shape \((N, out\_channels)\).

Raises:: ValueError – If the shape of weight_init or bias_init is incorrect.

Supported Platforms:: Ascend GPU

Examples

>>> x = Tensor(np.array([[1, 2, 3], [3, 4, 5]]), mindspore.float32)
>>> net = nn.DenseThor(3, 4, weight_init="ones")
>>> output = net(x)
>>> print(output)
[[  6.  6.  6.  6.]
 [ 12. 12. 12. 12. ]]

save_gradient(dout)[source]¶: this function only for thor optimizer save_gradient

class tinyms.layers.Conv2dThor(in_channels, out_channels, kernel_size, stride=1, pad_mode='same', padding=0, dilation=1, group=1, has_bias=False, weight_init='normal', bias_init='zeros')[source]¶

2D convolution layer and saving the information needed for THOR.

Applies a 2D convolution over an input tensor which is typically of shape \((N, C_{in}, H_{in}, W_{in})\), where \(N\) is batch size, \(C_{in}\) is channel number, and \(H_{in}, W_{in})\) are height and width. And saves the information A and G in the 2D convolution layer needed for THOR.

For each batch of shape \((C_{in}, H_{in}, W_{in})\), the formula is defined as:

\[out_j = \sum_{i=0}^{C_{in} - 1} ccor(W_{ij}, X_i) + b_j,\]

where \(ccor\) is the cross-correlation operator, \(C_{in}\) is the input channel number, \(j\) ranges from \(0\) to \(C_{out} - 1\), \(W_{ij}\) corresponds to the \(i\)-th channel of the \(j\)-th filter and \(out_{j}\) corresponds to the \(j\)-th channel of the output. \(W_{ij}\) is a slice of kernel and it has shape \((\text{ks_h}, \text{ks_w})\), where \(\text{ks_h}\) and \(\text{ks_w}\) are the height and width of the convolution kernel. The full kernel has shape \((C_{out}, C_{in} // \text{group}, \text{ks_h}, \text{ks_w})\), where group is the group number to split the input x in the channel dimension.

If the ‘pad_mode’ is set to be “valid”, the output height and width will be \(\left \lfloor{1 + \frac{H_{in} + 2 \times \text{padding} - \text{ks_h} - (\text{ks_h} - 1) \times (\text{dilation} - 1) }{\text{stride}}} \right \rfloor\) and \(\left \lfloor{1 + \frac{W_{in} + 2 \times \text{padding} - \text{ks_w} - (\text{ks_w} - 1) \times (\text{dilation} - 1) }{\text{stride}}} \right \rfloor\) respectively.

Note

For Ascend, the type of inputs should be subclass of Tensor[Float16], Tensor[Int8]. For GPU, the type of inputs should be subclass of Tensor[Float32].

Parameters:

in_channels (int) – The number of the input channel \(C_{in}\).
out_channels (int) – The number of the output channel \(C_{out}\).
kernel_size (Union[int, tuple[int]]) – The data type is int or a tuple of 2 integers. Specifies the height and width of the 2D convolution window. Single int means that the value is not only the height, but also the width of the kernel. A tuple of 2 integers means the height and the width of the kernel respectively.
stride (Union[int, tuple[int]]) – The distance of kernel moving, an int number represents the height and width of movement, or a tuple of two int numbers that represent height and width of movement, respectively. Default: 1.
pad_mode (str) –
Specifies padding mode. The optional values are “same”, “valid”, “pad”. Default: “same”.
- same: Adopts the way of completion. The shape of the output will be the same as the x. The total number of padding will be calculated in horizontal and vertical directions and evenly distributed to top and bottom, left and right if possible. Otherwise, the last extra padding will be done from the bottom and the right side. If this mode is set, padding must be 0.
- valid: Adopts the way of discarding. The possible largest height and width of output will be returned without padding. Extra pixels will be discarded. If this mode is set, padding must be 0.
- pad: Implicit paddings on both sides of the input x. The number of padding will be padded to the input Tensor borders. padding must be greater than or equal to 0.
padding (Union[int, tuple[int]]) – Implicit paddings on both sides of the input x. If padding is an integer, the paddings of top, bottom, left and right are the same, equal to padding. If padding is a tuple with four integers, the paddings of top, bottom, left and right will be equal to padding[0], padding[1], padding[2], and padding[3] accordingly. Default: 0.
dilation (Union[int, tuple[int]]) – The data type is int or a tuple of 2 integers. Specifies the dilation rate to use for dilated convolution. If set to be \(k > 1\), there will be \(k - 1\) pixels skipped for each sampling location. Its value must be greater or equal to 1 and bounded by the height and width of the input x. Default: 1.
group (int) – Splits filter into groups, in_ channels and out_channels must be divisible by the number of groups. If the group is equal to in_channels and out_channels, this 2D convolution layer also can be called 2D depthwise convolution layer. Default: 1.
has_bias (bool) – Specifies whether the layer uses a bias vector. Default: False.
weight_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializes the convolution kernel. It can be a Tensor, a string, an Initializer or a number. When a string is specified, values from ‘TruncatedNormal’, ‘Normal’, ‘Uniform’, ‘HeUniform’ and ‘XavierUniform’ distributions as well as constant ‘One’ and ‘Zero’ distributions are possible. Alias ‘xavier_uniform’, ‘he_uniform’, ‘ones’ and ‘zeros’ are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of Initializer for more details. Default: ‘normal’.
bias_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializes the bias vector. Possible Initializer and string are the same as ‘weight_init’. Refer to the values of Initializer for more details. Default: ‘zeros’.

Inputs:

x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, C_{out}, H_{out}, W_{out})\).

Supported Platforms:

Ascend GPU

Examples

>>> net = nn.Conv2dThor(120, 240, 4, has_bias=False, weight_init='normal')
>>> # for Ascend
>>> x = Tensor(np.ones([1, 120, 1024, 640]), mindspore.float16)
>>> print(net(x).shape)
(1, 240, 1024, 640)

save_gradient(dout)[source]¶

class tinyms.layers.EmbeddingThor(vocab_size, embedding_size, use_one_hot=False, embedding_table='normal', dtype=mindspore.float32, padding_idx=None)[source]¶

A simple lookup table that stores embeddings of a fixed dictionary and size and saving the information needed for THOR.

This module is often used to store word embeddings and retrieve them using indices. The input to the module is a list of indices, and the output is the corresponding word embeddings. And saves the information A and G in the dense connected layer needed for THOR.

Note

When ‘use_one_hot’ is set to True, the type of the input x must be mindspore.int32.

Parameters:

vocab_size (int) – The size of the dictionary of embeddings.
embedding_size (int) – The size of each embedding vector.
use_one_hot (bool) – Specifies whether to apply one_hot encoding form. Default: False.
embedding_table (Union[Tensor, str, Initializer, numbers.Number]) – Initializes the embedding_table. Refer to class initializer for the values of string when a string is specified. Default: ‘normal’.
dtype (mindspore.dtype) – Data type of input x. Default: mindspore.float32.
padding_idx (int, None) – When the padding_idx encounters index, the output embedding vector of this index will be initialized to zero. Default: None. The feature is inactivated.

Inputs:

x (Tensor) - Tensor of input shape \((\text{batch_size}, \text{x_length})\). The elements of the Tensor must be integer and not larger than vocab_size. Otherwise the corresponding embedding vector will be zero.

Outputs:

Tensor of output shape \((\text{batch_size}, \text{x_length}, \text{embedding_size})\).

Supported Platforms:

Ascend GPU

Examples

>>> net = nn.EmbeddingThor(20000, 768,  True)
>>> x = Tensor(np.ones([8, 128]), mindspore.int32)
>>>
>>> # Maps the input word IDs to word embedding.
>>> output = net(x)
>>> output.shape
(8, 128, 768)

save_gradient(dout)[source]¶: this function only for thor optimizer save_gradient

class tinyms.layers.EmbeddingLookupThor(vocab_size, embedding_size, param_init='normal', target='CPU', slice_mode='batch_slice', manual_shapes=None, max_norm=None, sparse=True, vocab_cache_size=0)[source]¶

Returns a slice of the input tensor based on the specified indices and saving the information needed for THOR.

This module has the same function as EmbeddingLookup, but additionally saves the information A and G in the embeddinglookup layer needed for THOR.

Parameters:

vocab_size (int) – The size of the dictionary of embeddings.
embedding_size (int) – The size of each embedding vector.
param_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the embedding_table. Refer to class initializer for the values of string when a string is specified. Default: ‘normal’.
target (str) – Specifies the target where the op is executed. The value must in [‘DEVICE’, ‘CPU’]. Default: ‘CPU’.
slice_mode (str) – The slicing way in semi_auto_parallel/auto_parallel. The value must get through nn.EmbeddingLookup. Default: nn.EmbeddingLookup.BATCH_SLICE.
manual_shapes (tuple) – The accompaniment array in field slice mode.
max_norm (Union[float, None]) – A maximum clipping value. The data type must be float16, float32 or None. Default: None
sparse (bool) – Using sparse mode. When ‘target’ is set to ‘CPU’, ‘sparse’ has to be true. Default: True.
vocab_cache_size (int) – Cache size of the dictionary of embeddings. Default: 0. It is valid only in ‘DEVICE’ target. And the moment parameter of corresponding optimizer will also be set to the cache size. In addition, it should be noted that it will cost the ‘DEVICE’ memory, so suggests setting a reasonable value to avoid insufficient memory.

Inputs:

input_indices (Tensor) - The shape of tensor is \((y_1, y_2, ..., y_S)\).

Outputs:

Tensor, the shape of tensor is \((z_1, z_2, ..., z_N)\).

Raises:

ValueError – If target is neither ‘CPU’ nor ‘DEVICE’.
ValueError – If slice_mode is not one of ‘batch_slice’ or ‘field_slice’ or ‘table_row_slice’ or ‘table_column_slice’.
ValueError – If sparse is False and target is ‘CPU’.
ValueError – If slice_mode is ‘field_slice’ and manual_shapes is None.
TypeError – If vocab_size or embedding_size or vocab_cache_size is not an int.
TypeError – If sparse is not a bool or manual_shapes is not a tuple.
ValueError – If vocab_size or embedding_size is less than 1.
ValueError – If vocab_cache_size is less than 0.

Supported Platforms:: Ascend

Examples

>>> input_indices = Tensor(np.array([[1, 0], [3, 2]]), mindspore.int32)
>>> result = nn.EmbeddingLookup(4,2)(input_indices)
>>> print(result.shape)
(2, 2, 2)

save_gradient(dout)[source]¶: this function only for thor optimizer save_gradient

class tinyms.layers.ConstantPad1d(padding, value)[source]¶

Using a given constant value to pads the last dimensions of input tensor.

Parameters:

padding (Union[int, tuple]) – The padding size to pad the last dimension of input tensor. If is int, uses the same padding in both boundaries of input’s last dimension. If a 2-tuple, uses (padding_0, padding_1) to pad. If the input is x, the size of last dimension of output is \(padding\_0 + x.shape[-1] + padding\_1\). The remaining dimensions of the output are consistent with those of the input.
value (Union[int, float]) – Padding value.

Returns:

Tensor, the tensor after padding.

Raises:

TypeError – If padding is not a tuple or int.
TypeError – If value is not int or float.
ValueError – If the length of padding with tuple type is not equal to 2.
ValueError – If the output shape after padding is not positive.

Supported Platforms:: Ascend GPU CPU

Examples

>>> import numpy as np
>>> from mindspore import Tensor
>>> from mindspore.nn import ConstantPad1d
>>> x = np.ones(shape=(1, 2, 3, 4)).astype(np.float32)
>>> x = Tensor(x)
>>> # padding is tuple
>>> padding = (0, 1)
>>> value = 0.5
>>> pad1d = ConstantPad1d(padding, value)
>>> out = pad1d(x)
>>> print(out)
[[[[1.  1.  1.  1.  0.5]
   [1.  1.  1.  1.  0.5]
   [1.  1.  1.  1.  0.5]]
  [[1.  1.  1.  1.  0.5]
   [1.  1.  1.  1.  0.5]
   [1.  1.  1.  1.  0.5]]]]
>>> print(out.shape)
(1, 2, 3, 5)
>>> # padding is int
>>> padding = 1
>>> value = 0.5
>>> pad1d = ConstantPad1d(padding, value)
>>> out = pad1d(x)
>>> print(out)
[[[[0.5 1.  1.  1.  1.  0.5]
   [0.5 1.  1.  1.  1.  0.5]
   [0.5 1.  1.  1.  1.  0.5]]
  [[0.5 1.  1.  1.  1.  0.5]
   [0.5 1.  1.  1.  1.  0.5]
   [0.5 1.  1.  1.  1.  0.5]]]]
>>> print(out.shape)
(1, 2, 3, 6)
>>> # padding is negative
>>> padding = (-1, 0)
>>> value = 0.5
>>> pad1d = ConstantPad1d(padding, value)
>>> out = pad1d(x)
>>> print(out)
[[[[1. 1. 1.]
   [1. 1. 1.]
   [1. 1. 1.]]
  [[1. 1. 1.]
   [1. 1. 1.]
   [1. 1. 1.]]]]
>>> print(out.shape)
(1, 2, 3, 3)

class tinyms.layers.ConstantPad2d(padding, value)[source]¶

Using a given constant value to pads the last two dimensions of input tensor.

Parameters:

padding (Union[int, tuple]) – The padding size to pad the last two dimensions of input tensor. If is int, uses the same padding in boundaries of input’s last two dimensions. If is tuple and length of padding is 4 uses (padding_0, padding_1, padding_2, padding_3) to pad. If the input is x, the size of last dimension of output is \(padding\_0 + x.shape[-1] + padding\_1\). The size of penultimate dimension of output is \(padding\_2 + x.shape[-2] + padding\_3\). The remaining dimensions of the output are consistent with those of the input.
value (Union[int, float]) – Padding value.

Returns:

Tensor, the tensor after padding.

Raises:

TypeError – If padding is not a tuple or int.
TypeError – If value is not int or float.
ValueError – If the length of padding is more than 4 or not a multiple of 2.
ValueError – If the output shape after padding is not positive.

Supported Platforms:: Ascend GPU CPU

Examples

>>> import numpy as np
>>> from mindspore import Tensor
>>> from mindspore.nn import ConstantPad2d
>>> x = np.ones(shape=(1, 2, 3, 4)).astype(np.float32)
>>> x = Tensor(x)
>>> padding = (-1, 1, 0, 1)
>>> value = 0.5
>>> pad2d = ConstantPad2d(padding, value)
>>> out = pad2d(x)
>>> print(out)
[[[[1.  1.  1.  0.5]
   [1.  1.  1.  0.5]
   [1.  1.  1.  0.5]
   [0.5 0.5 0.5 0.5]]
  [[1.  1.  1.  0.5]
   [1.  1.  1.  0.5]
   [1.  1.  1.  0.5]
   [0.5 0.5 0.5 0.5]]]]
>>> print(out.shape)
(1, 2, 4, 4)

class tinyms.layers.ConstantPad3d(padding, value)[source]¶

Using a given constant value to pads the last three dimensions of input tensor.

Parameters:

padding (Union[int, tuple]) – The padding size to pad the last three dimensions of input tensor. If is int, uses the same padding in boundaries of input’s last three dimensions. If is tuple and length of padding is 6 uses (padding_0, padding_1, padding_2, padding_3, padding_4, padding_5) to pad. If the input is x, the size of last dimension of output is \(padding\_0 + x.shape[-1] + padding\_1\). The size of penultimate dimension of output is \(padding\_2 + x.shape[-2] + padding\_3\). The size of 3rd to last dimension of output is \(padding\_4 + x.shape[-3] + padding\_5\). The remaining dimensions of the output are consistent with those of the input.
value (Union[int, float]) – Padding value.

Returns:

Tensor, the tensor after padding.

Raises:

TypeError – If padding is not a tuple or int.
TypeError – If value is not int or float.
ValueError – If the length of padding is more than 6 or not a multiple of 2.
ValueError – If the output shape after padding is not positive.

Supported Platforms:: Ascend GPU CPU

Examples

>>> import numpy as np
>>> from mindspore import Tensor
>>> from mindspore.nn import ConstantPad3d
>>> x = np.ones(shape=(1, 2, 3, 4)).astype(np.float32)
>>> x = Tensor(x)
>>> padding = (-1, 1, 0, 1, 1, 0)
>>> value = 0.5
>>> pad3d = ConstantPad3d(padding, value)
>>> out = pad3d(x)
>>> print(out)
[[[[0.5 0.5 0.5 0.5]
   [0.5 0.5 0.5 0.5]
   [0.5 0.5 0.5 0.5]
   [0.5 0.5 0.5 0.5]]
  [[1.  1.  1.  0.5]
   [1.  1.  1.  0.5]
   [1.  1.  1.  0.5]
   [0.5 0.5 0.5 0.5]]
  [[1.  1.  1.  0.5]
   [1.  1.  1.  0.5]
   [1.  1.  1.  0.5]
   [0.5 0.5 0.5 0.5]]]]
>>> print(out.shape)
(1, 3, 4, 4)

class tinyms.layers.ReflectionPad1d(padding)[source]¶

Using a given padding to do reflection pad on the given tensor.

Parameters:: padding (union[int, tuple]) – The padding size to pad the last dimension of input tensor. If padding is an integer: all directions will be padded with the same size. If padding is a tuple: uses \((pad\_left, pad\_right)\) to pad.

Inputs:

x (Tensor) - 2D or 3D, shape: \((C, W_{in})\) or \((N, C, W_{in})\).

Outputs:

Tensor, after padding. Shape: \((C, W_{out})\) or \((N, C, W_{out})\), where \(W_{out} = W_{in} + pad\_left + pad\_right\).

Raises:

TypeError – If ‘padding’ is not a tuple or int.
TypeError – If there is an element in ‘padding’ that is not int.
ValueError – If the length of ‘padding’ is not divisible by 2.
ValueError – If there is an element in ‘padding’ that is negative.
ValueError – If the there is a dimension mismatch between the padding and the tensor.

Supported Platforms:: Ascend GPU CPU

Examples

>>> import numpy as np
>>> from mindspore import Tensor
>>> from mindspore.nn import ReflectionPad1d
>>> x = Tensor(np.array([[[0, 1, 2, 3], [4, 5, 6, 7]]]).astype(np.float32))
>>> # x has shape (1, 2, 4)
>>> padding = (3, 1)
>>> # The first and the second dimension of x remain the same.
>>> # The third dimension of x: W_out = W_in + pad_left + pad_right = 4 + 3 + 1 = 8
>>> pad1d = ReflectionPad1d(padding)
>>> out = pad1d(x)
>>> # The shape of out is (1, 2, 8)
>>> print(out)
[[[3. 2. 1. 0. 1. 2. 3. 2.]
  [7. 6. 5. 4. 5. 6. 7. 6.]]]

class tinyms.layers.ReflectionPad2d(padding)[source]¶

Using a given padding to do reflection pad the given tensor.

Parameters:: padding (union[int, tuple]) – The padding size to pad the input tensor. If padding is an integer: all directions will be padded with the same size. If padding is a tuple: uses \((pad\_left, pad\_right, pad\_up, pad\_down)\) to pad.

Inputs:

x (Tensor) - 3D or 4D, shape: \((C, H_{in}, W_{in})\) or \((N, C, H_{in}, W_{in})\).

Outputs:

Tensor, after padding. Shape: \((C, H_{out}, W_{out})\) or \((N, C, H_{out}, W_{out})\), where \(H_{out} = H_{in} + pad\_up + pad\_down\), \(W_{out} = W_{in} + pad\_left + pad\_right\).

Raises:

TypeError – If ‘padding’ is not a tuple or int.
TypeError – If there is an element in ‘padding’ that is not int.
ValueError – If the length of ‘padding’ is not divisible by 2.
ValueError – If there is an element in ‘padding’ that is negative.
ValueError – If the there is a dimension mismatch between the padding and the tensor.

Supported Platforms:: Ascend GPU CPU

Examples

>>> import numpy as np
>>> from mindspore import Tensor
>>> from mindspore.nn import ReflectionPad2d
>>> x = Tensor(np.array([[[0, 1, 2], [3, 4, 5], [6, 7, 8]]]).astype(np.float32))
>>> # x has shape (1, 3, 3)
>>> padding = (1, 1, 2, 0)
>>> pad2d = ReflectionPad2d(padding)
>>> # The first dimension of x remains the same.
>>> # The second dimension of x: H_out = H_in + pad_up + pad_down = 3 + 1 + 1 = 5
>>> # The third dimension of x: W_out = W_in + pad_left + pad_right = 3 + 2 + 0 = 5
>>> out = pad2d(x)
>>> # The shape of out is (1, 5, 5)
>>> print(out)
[[[7. 6. 7. 8. 7.]
  [4. 3. 4. 5. 4.]
  [1. 0. 1. 2. 1.]
  [4. 3. 4. 5. 4.]
  [7. 6. 7. 8. 7.]]]

class tinyms.layers.ReflectionPad3d(padding)[source]¶

Pad the given tensor in a reflecting way using the input boundaries as the axis of symmetry.

Note

ReflectionPad3d has not supported 5D tensor yet.

Parameters:: padding (union[int, tuple]) – The padding size to pad the input tensor. If padding is an integer: all directions will be padded with the same size. If padding is a tuple: uses \((pad\_left, pad\_right, pad\_up, pad\_down, pad\_front, pad\_back)\) to pad.

Inputs:

x (Tensor) - 4D Tensor, shape: \((N, D_{in}, H_{in}, W_{in})\).

Outputs:

Tensor, after padding. Shape: \((N, D_{out}, H_{out}, W_{out})\), where \(D_{out} = D_{in} + pad\_front + pad\_back\), \(H_{out} = H_{in} + pad\_up + pad\_down\) \(W_{out} = W_{in} + pad\_left + pad\_right\).

Raises:

TypeError – If ‘padding’ is not a tuple or int.
TypeError – If there is an element in ‘padding’ that is not int.
ValueError – If the length of ‘padding’ is not divisible by 2.
ValueError – If there is an element in ‘padding’ that is negative.
ValueError – If the there is a dimension mismatch between the padding and the tensor.

Supported Platforms:: Ascend GPU CPU

Examples

>>> import numpy as np
>>> from mindspore import Tensor
>>> from mindspore.nn import ReflectionPad3d
>>> arr = np.arange(8).astype(np.float32).reshape((1, 2, 2, 2))
>>> x = Tensor(arr)
>>> # x has shape (1, 2, 2, 2)
>>> padding = (1, 1, 1, 0, 0, 1)
>>> pad3d = ReflectionPad3d(padding)
>>> out = pad3d(x)
>>> # The first dimension of x remains the same.
>>> # The second dimension of x: D_out = D_in + pad_front + pad_back = 2 + 0 + 1 = 3
>>> # The third dimension of x: H_out = H_in + pad_up + pad_down = 2 + 1 + 0 = 3
>>> # The last dimension of x: W_out = W_in + pad_left + pad_right = 2 + 1 + 1 = 4
>>> # The shape of out is (1, 3, 3, 4)
>>> print(out)
[[[[3. 2. 3. 2.]
   [1. 0. 1. 0.]
   [3. 2. 3. 2.]]
  [[7. 6. 7. 6.]
   [5. 4. 5. 4.]
   [7. 6. 7. 6.]]
  [[3. 2. 3. 2.]
   [1. 0. 1. 0.]
   [3. 2. 3. 2.]]]]

class tinyms.layers.ZeroPad2d(padding)[source]¶

Pads the last two dimensions of input tensor with zero.

Parameters:

padding (Union[int, tuple]) – The padding size to pad the last two dimensions of input tensor. If is int, uses the same padding in boundaries of input’s last two dimensions. If is tuple and length of padding is 4 uses (padding_0, padding_1, padding_2, padding_3) to pad. If the input is x, the size of last dimension of output is \(padding\_0 + x.shape[-1] + padding\_1\). The size of penultimate dimension of output is \(padding\_2 + x.shape[-2] + padding\_3\). The remaining dimensions of the output are consistent with those of the input.

Returns:

Tensor, the tensor after padding.

Raises:

TypeError – If padding is not a tuple or int.
ValueError – If the length of padding is more than 4 or not a multiple of 2.
ValueError – If the output shape after padding is not positive.

Supported Platforms:: Ascend GPU CPU

Examples

>>> import numpy as np
>>> from mindspore import Tensor
>>> from mindspore.nn import ZeroPad2d
>>> x = np.ones(shape=(1, 2, 3, 4)).astype(np.float32)
>>> x = Tensor(x)
>>> padding = (-1, 1, 0, 1)
>>> pad = ZeroPad2d(padding)
>>> out = pad(x)
>>> print(out)
[[[[1. 1. 1. 0.]
   [1. 1. 1. 0.]
   [1. 1. 1. 0.]
   [0. 0. 0. 0.]]
  [[1. 1. 1. 0.]
   [1. 1. 1. 0.]
   [1. 1. 1. 0.]
   [0. 0. 0. 0.]]]]
>>> print(out.shape)
(1, 2, 4, 4)

class tinyms.layers.ReplicationPad1d(padding)[source]¶

Pad on W dimension of input x according to padding.

Parameters:

padding (union[int, tuple]) –

The padding size to pad the last dimension of x .

If padding is an integer, all directions will be padded with the same size.
If padding is a tuple, uses \((pad_{left}, pad_{right})\) to pad.

Inputs:

x (Tensor) - 2D or 3D, shape: \((C, W_{in})\) or \((N, C, W_{in})\).

Outputs:

Tensor, after padding. Shape: \((C, W_{out})\) or \((N, C, W_{out})\), where \(W_{out} = W_{in} + pad_{left} + pad_{right}\)

Raises:

TypeError – If padding is neither a tuple nor an int.
TypeError – If there is an element in padding that is not int.
ValueError – If padding is tuple and the length of padding is not divisible by 2.
ValueError – If padding is tuple and there is a dimension mismatch between the padding and the tensor.

Supported Platforms:: GPU

Examples

>>> import numpy as np
>>> import mindspore
>>> from mindspore import Tensor
>>> from mindspore.nn import ReplicationPad1d
>>> pad1d = ReplicationPad1d(2)
>>> input = Tensor(np.arange(0, 8).reshape(1, 2, 4), mindspore.float32)
>>> print(input)
[[[0. 1. 2. 3.]
  [4. 5. 6. 7.]]]
>>> out = pad1d(input)
>>> print(out)
[[[0. 0. 0. 1. 2. 3. 3. 3.]
  [4. 4. 4. 5. 6. 7. 7. 7.]]]
>>> pad1d = ReplicationPad1d((3, 1))
>>> out = pad1d(input)
>>> print(out)
[[[0. 0. 0. 0. 1. 2. 3. 3.]
  [4. 4. 4. 4. 5. 6. 7. 7.]]]

class tinyms.layers.ReplicationPad2d(padding)[source]¶

Pad on HW dimension of input x according to padding.

Parameters:

padding (union[int, tuple]) –

The padding size to pad the last two dimension of x .

If padding is an integer, all directions will be padded with the same size.
If padding is a tuple, uses \((pad_{left}, pad_{right}, pad_{up}, pad_{down})\) to pad.

Inputs:

x (Tensor) - 3D or 4D, shape: \((C, H_{in}, W_{in})\) or \((N, C, H_{in}, W_{in})\).

Outputs:

Tensor, after padding. Shape: \((C, H_{out}, W_{out})\) or \((N, C, H_{out}, W_{out})\), where \(H_{out} = H_{in} + pad_{up} + pad_{down}\), \(W_{out} = W_{in} + pad_{left} + pad_{right}\).

Raises:

TypeError – If padding is neither a tuple nor an int.
TypeError – If there is an element in padding that is not int.
ValueError – If padding is tuple and the length of padding is not divisible by 2.
ValueError – If padding is tuple and there is a dimension mismatch between the padding and the tensor.

Supported Platforms:: GPU

Examples

>>> import numpy as np
>>> import mindspore
>>> from mindspore import Tensor
>>> from mindspore.nn import ReplicationPad2d
>>> pad2d = ReplicationPad2d(2)
>>> input = Tensor(np.arange(0, 9).reshape(1, 1, 3, 3), mindspore.float32)
>>> print(input)
[[[[0. 1. 2.]
   [3. 4. 5.]
   [6. 7. 8.]]]]
>>> out = pad2d(input)
>>> print(out)
[[[[0. 0. 0. 1. 2. 2. 2.]
   [0. 0. 0. 1. 2. 2. 2.]
   [0. 0. 0. 1. 2. 2. 2.]
   [3. 3. 3. 4. 5. 5. 5.]
   [6. 6. 6. 7. 8. 8. 8.]
   [6. 6. 6. 7. 8. 8. 8.]
   [6. 6. 6. 7. 8. 8. 8.]]]]
>>> pad2d = ReplicationPad2d((1, 1, 2, 0))
>>> out = pad2d(input)
>>> print(out)
[[[[0. 0. 1. 2. 2.]
   [0. 0. 1. 2. 2.]
   [0. 0. 1. 2. 2.]
   [3. 3. 4. 5. 5.]
   [6. 6. 7. 8. 8.]]]]

class tinyms.layers.ReplicationPad3d(padding)[source]¶

Pad on DHW dimension of input x according to padding.

Parameters:

padding (union[int, tuple]) –

The padding size to pad the last three dimension of x .

If padding is an integer, all directions will be padded with the same size.
If padding is a tuple, uses \((pad_{left}, pad_{right}, pad_{up}, pad_{down}, pad_{front}, pad_{back})\) to pad.

Inputs:

x (Tensor) - 4D or 5D, shape: \((C, D_{in}, H_{in}, W_{in})\) or \((N, C, D_{in}, H_{in}, W_{in})\).

Outputs:

Tensor, after padding, shape: \((C, D_{out}, H_{out}, W_{out})\) or \((N, C, D_{out}, H_{out}, W_{out})\), where \(D_{out} = D_{in} + pad_{front} + pad_{back}\), \(H_{out} = H_{in} + pad_{up} + pad_{down}\), \(W_{out} = W_{in} + pad_{left} + pad_{right}\).

Raises:

TypeError – If padding is neither a tuple nor an int.
TypeError – If there is an element in padding that is not int.
ValueError – If padding is tuple and the length of padding is not divisible by 2.
ValueError – If padding is tuple and there is a dimension mismatch between the padding and the tensor.

Supported Platforms:: GPU

Examples

>>> import numpy as np
>>> import mindspore
>>> from mindspore import Tensor
>>> from mindspore.nn import ReplicationPad3d
>>> pad3d = ReplicationPad3d(1)
>>> input = Tensor(np.arange(0, 9).reshape(1, 1, 1, 3, 3), mindspore.float32)
>>> out = pad3d(input)
>>> print(out)
[[[[[0. 0. 1. 2. 2.]
    [0. 0. 1. 2. 2.]
    [3. 3. 4. 5. 5.]
    [6. 6. 7. 8. 8.]
    [6. 6. 7. 8. 8.]]
   [[0. 0. 1. 2. 2.]
    [0. 0. 1. 2. 2.]
    [3. 3. 4. 5. 5.]
    [6. 6. 7. 8. 8.]
    [6. 6. 7. 8. 8.]]
   [[0. 0. 1. 2. 2.]
    [0. 0. 1. 2. 2.]
    [3. 3. 4. 5. 5.]
    [6. 6. 7. 8. 8.]
    [6. 6. 7. 8. 8.]]]]]

class tinyms.layers.ChannelShuffle(groups)[source]¶

Divide the channels of Tensor whose shape is \((*, C, H, W)\) into \(g\) groups to obtain a Tensor with shape \((*, C \frac g, g, H, W)\), and transpose along the corresponding axis of \(C\), \(\frac{g}{}\) and \(g\) to restore Tensor to the original shape.

Parameters:: groups (int) – Number of groups to divide channels in, must be greater than 0. Refer to \(g\).

Inputs:

x (Tensor) - Tensor of shape \((*, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor, with the same type and shape as the x.

Raises:

TypeError – If groups is not an int.
ValueError – If groups is less than 1.
ValueError – If dims of x is less than 3.
ValueError – If number of channels can not be divisible by groups.

Supported Platforms:: Ascend GPU CPU

Examples

>>> channel_shuffle = nn.ChannelShuffle(2)
>>> x = Tensor(np.arange(16).astype(np.int32).reshape(1, 4, 2, 2))
>>> print(x)
[[[[0 1],
   [2 3]],
  [[4 5],
   [6 7]],
  [[8 9],
   [10 11]],
  [[12 13],
   [14 15]],
 ]]
>>> output = channel_shuffle(x)
>>> print(output)
[[[[0 1],
   [2 3]],
  [[8 9],
   [10 11]],
  [[4 5],
   [6 7]],
  [[12 13],
   [14 15]],
 ]]