tinyms.layers

Layer module contains pre-defined building blocks or computing units to construct neural networks.

The high-level components (Layers) used to construct the neural network.

class tinyms.layers.Layer(auto_prefix=True, flags=None)[source]

Base class for all neural networks.

A ‘Layer’ could be a single neural network layer, such as conv2d, relu, batch_norm, etc. or a composition of cells to constructing a network.

Note

In general, the autograd algorithm will automatically generate the implementation of the gradient function, but if back-propagation(bprop) method is implemented, the gradient function will be replaced by the bprop. The bprop implementation will receive a Tensor dout containing the gradient of the loss w.r.t. the output, and a Tensor out containing the forward result. The bprop needs to compute the gradient of the loss w.r.t. the inputs, gradient of the loss w.r.t. Parameter variables are not supported currently. The bprop method must contain the self parameter.

Parameters

auto_prefix (bool) – Recursively generate namespaces. Default: True.

Examples

>>> from tinyms import layers, primitives as P
>>>
>>> class MyNet(layers.Layer):
...    def __init__(self):
...        super(MyNet, self).__init__()
...        self.relu = P.ReLU()
...
...    def construct(self, x):
...        return self.relu(x)
property bprop_debug

Get whether cell custom bprop debug is enabled.

cast_param(param)

Cast parameter according to auto mix precision level in pynative mode.

Parameters

param (Parameter) – The parameter to cast.

cells()

Returns an iterator over immediate cells.

cells_and_names(cells=None, name_prefix='')

Returns an iterator over all cells in the network.

Includes the cell’s name and itself.

Parameters
  • cells (str) – Cells to iterate over. Default: None.

  • name_prefix (str) – Namespace. Default: ‘’.

Examples

>>> n = Net()
>>> names = []
>>> for m in n.cells_and_names():
...     if m[0]:
...         names.append(m[0])
compile(*inputs)

Compiles cell.

Parameters

inputs (tuple) – Input parameters.

compile_and_run(*inputs)

Compiles and runs cell.

Parameters

inputs (tuple) – Input parameters.

Returns

Object, the result of executing.

construct(*inputs, **kwargs)

Defines the computation to be performed. This method must be overridden by all subclasses.

Note

The outermost net only supports tensor inputs by default. If want to support non tensor inputs, set the property support_non_tensor_inputs to the True. Refer to the property support_non_tensor_inputs description.

Returns

Tensor, returns the computed result.

exec_checkpoint_graph()

Executes saving checkpoint graph operation.

extend_repr()

Sets the extended representation of the Cell.

To print customized extended information, re-implement this method in your own cells.

generate_scope()

Generate the scope for each cell object in the network.

get_func_graph_proto()

Return graph binary proto.

get_parameters(expand=True)

Returns an iterator over cell parameters.

Yields parameters of this cell. If expand is True, yield parameters of this cell and all subcells.

Parameters

expand (bool) – If true, yields parameters of this cell and all subcells. Otherwise, only yield parameters that are direct members of this cell. Default: True.

Examples

>>> net = Net()
>>> parameters = []
>>> for item in net.get_parameters():
...     parameters.append(item)
get_scope()

Returns the scope of a cell object in one network.

init_parameters_data(auto_parallel_mode=False)

Initialize all parameters and replace the original saved parameters in cell.

Notes

trainable_params() and other similar interfaces may return different parameter instance after init_parameters_data, do not save these result.

Parameters

auto_parallel_mode (bool) – If running in auto_parallel_mode.

Returns

Dict[Parameter, Parameter], returns a dict of original parameter and replaced parameter.

insert_child_to_cell(child_name, child_cell)

Adds a child cell to the current cell with a given name.

Parameters
  • child_name (str) – Name of the child cell.

  • child_cell (Cell) – The child cell to be inserted.

Raises
  • KeyError – Child Cell’s name is incorrect or duplicated with the other child name.

  • TypeError – Child Cell’s type is incorrect.

insert_param_to_cell(param_name, param, check_name=True)

Adds a parameter to the current cell.

Inserts a parameter with given name to the cell. Please refer to the usage in source code of mindspore.nn.Cell.__setattr__.

Parameters
  • param_name (str) – Name of the parameter.

  • param (Parameter) – Parameter to be inserted to the cell.

  • check_name (bool) – Determines whether the name input is compatible. Default: True.

Raises
  • KeyError – If the name of parameter is null or contains dot.

  • AttributeError – If user did not call init() first.

  • TypeError – If the type of parameter is not Parameter.

load_parameter_slice(params)

Replace parameters with sliced tensors by parallel strategies.

Please refer to the usage in source code of mindspore.common._Executor.compile.

Parameters

params (dict) – The parameters dictionary used for initializing the data graph.

name_cells()

Returns an iterator over all cells in the network.

Include name of the cell and cell itself.

property param_prefix

Param prefix is the prefix of current cell’s direct child parameter.

parameters_and_names(name_prefix='', expand=True)

Returns an iterator over cell parameters.

Includes the parameter’s name and itself.

Parameters
  • name_prefix (str) – Namespace. Default: ‘’.

  • expand (bool) – If true, yields parameters of this cell and all subcells. Otherwise, only yield parameters that are direct members of this cell. Default: True.

Examples

>>> n = Net()
>>> names = []
>>> for m in n.parameters_and_names():
...     if m[0]:
...         names.append(m[0])
parameters_dict(recurse=True)

Gets parameters dictionary.

Gets the parameters dictionary of this cell.

Parameters

recurse (bool) – Whether contains the parameters of subcells. Default: True.

Returns

OrderedDict, return parameters dictionary.

register_backward_hook(fn)

Set the cell backward hook function. Note that this function is only supported in Pynative Mode.

Note

fn must be defined as the following code. cell_name is the name of registered cell. grad_input is gradient passed to the cell. grad_output is the gradient computed and passed to the next cell or primitve, which may be modified and returned. hook_fn(cell_name, grad_input, grad_output) -> Tensor or None.

Parameters

fn (function) – Specifies the hook function with grad as input.

set_auto_parallel()

Set the cell to auto parallel mode.

Note

If a cell needs to use the auto parallel or semi auto parallel mode for training, evaluation or prediction, this interface needs to be called by the cell.

set_broadcast_flag(mode=True)

Set the cell to data_parallel mode.

The cell can be accessed as an attribute using the given name.

Parameters

mode (bool) – Specifies whether the model is data_parallel. Default: True.

set_grad(requires_grad=True)

Sets the cell flag for gradient.

Parameters

requires_grad (bool) – Specifies if the net need to grad, if it is True, cell will construct backward network in pynative mode. Default: True.

set_parallel_input_with_inputs(*inputs)

Slice inputs tensors by parallel strategies, and set the sliced inputs to _parallel_input_run

Parameters

inputs (tuple) – inputs of construct method.

set_param_ps(recurse=True, init_in_server=False)

Set whether the trainable parameters are updated by parameter server and whether the trainable parameters are initialized on server.

Note

It only works when a running task is in the parameter server mode.

Parameters
  • recurse (bool) – Whether sets the trainable parameters of subcells. Default: True.

  • init_in_server (bool) – Whether trainable parameters updated by parameter server are initialized on server. Default: False.

set_train(mode=True)

Sets the cell to training mode.

The cell itself and all children cells will be set to training mode.

Parameters

mode (bool) – Specifies whether the model is training. Default: True.

property support_non_tensor_inputs

Whether support non tensor inputs in outermost net in GRAPH MODE. This property only used in forward net, and is not supported in grad net. The default value of the property is the False, that is, it does not support passing non tensor inputs to the outermost net. If you want to support, set the property to the True.

to_float(dst_type)

Add cast on all inputs of cell and child cells to run with certain float type.

If dst_type is mindspore.dtype.float16, all the inputs of Cell including input, Parameter, Tensor as const will be cast to float16. Please refer to the usage in source code of mindspore.train.amp.build_train_network.

Note

Multiple calls will overwrite.

Parameters

dst_type (mindspore.dtype) – Transfer Cell to Run with dst_type. dst_type can be mindspore.dtype.float16 or mindspore.dtype.float32.

Raises

ValueError – If dst_type is not float32 nor float16.

trainable_params(recurse=True)

Returns all trainable parameters.

Returns a list of all trainable parmeters.

Parameters

recurse (bool) – Whether contains the trainable parameters of subcells. Default: True.

Returns

List, the list of trainable parameters.

untrainable_params(recurse=True)

Returns all untrainable parameters.

Returns a list of all untrainable parameters.

Parameters

recurse (bool) – Whether contains the untrainable parameters of subcells. Default: True.

Returns

List, the list of untrainable parameters.

update_cell_prefix()

Update the all child cells’ self.param_prefix.

After being invoked, it can get all the cell’s children’s name prefix by ‘_param_prefix’.

update_cell_type(cell_type)

The current cell type is updated when a quantization aware training network is encountered.

After being invoked, it can set the cell type to ‘cell_type’.

update_parameters_name(prefix='', recurse=True)

Updates the names of parameters with given prefix string.

Adds the given prefix to the names of parameters.

Parameters
  • prefix (str) – The prefix string.

  • recurse (bool) – Whether contains the parameters of subcells. Default: True.

Layer module contains pre-defined building blocks or computing units to construct neural networks.

The high-level components (Layers) used to construct the neural network.

class tinyms.layers.SequentialLayer(*args)[source]

Sequential layer container.

A list of Layers will be added to it in the order they are passed in the constructor. Alternatively, an ordered dict of cells can also be passed in.

Parameters

args (Union[list, OrderedDict]) – List of subclass of Layer.

Raises

TypeError – If the type of the argument is not list or OrderedDict.

Inputs:
  • input (Tensor) - Tensor with shape according to the first Cell in the sequence.

Outputs:

Tensor, the output Tensor with shape depending on the input and defined sequence of Layers.

Examples

>>> import tinyms as ts
>>> from tinyms.layers import SequentialLayer, Conv2d, ReLU
>>>
>>> seq_layer = SequentialLayer([Conv2d(3, 2, 3, pad_mode='valid', weight_init="ones"), ReLU()])
>>> x = ts.ones([1, 3, 4, 4])
>>> print(seq_layer(x))
[[[[27. 27.]
   [27. 27.]]
  [[27. 27.]
   [27. 27.]]]]
class tinyms.layers.LayerList(*args)[source]

Holds Layers in a list.

LayerList can be used like a regular Python list, support ‘__getitem__’, ‘__setitem__’, ‘__delitem__’, ‘__len__’, ‘__iter__’ and ‘__iadd__’, but layers it contains are properly registered, and will be visible by all Layer methods.

Parameters

args (list, optional) – List of subclass of Layer.

Examples

>>> from tinyms.layers import LayerList, Conv2d, BatchNorm2d, ReLU
>>>
>>> conv = nn.Conv2d(100, 20, 3)
>>> layers = LayerList([BatchNorm2d(20)])
>>> layers.insert(0, Conv2d(100, 20, 3))
>>> layers.append(ReLU())
>>> layers
LayerList<
  (0): Conv2d<input_channels=100, ..., bias_init=None>
  (1): BatchNorm2d<num_features=20, ..., moving_variance=Parameter (name=variance)>
  (2): ReLU<>
  >
class tinyms.layers.Softmax(axis=-1)[source]

Softmax activation function.

Applies the Softmax function to an n-dimensional input Tensor.

The input is a Tensor of logits transformed with exponential function and then normalized to lie in range [0, 1] and sum up to 1.

Softmax is defined as:

\[\text{softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_{j=0}^{n-1}\exp(x_j)},\]

where \(x_{i}\) is the \(i\)-th slice in the given dimension of the input Tensor.

Parameters

axis (Union[int, tuple[int]]) – The axis to apply Softmax operation, -1 means the last dimension. Default: -1.

Inputs:
  • x (Tensor) - The input of Softmax.

Outputs:

Tensor, which has the same type and shape as x with values in the range[0,1].

Supported Platforms:

Ascend GPU CPU

Examples

>>> input_x = Tensor(np.array([-1, -2, 0, 2, 1]), mindspore.float16)
>>> softmax = nn.Softmax()
>>> output = softmax(input_x)
>>> print(output)
[0.03168 0.01166 0.0861  0.636   0.2341 ]
class tinyms.layers.LogSoftmax(axis=-1)[source]

LogSoftmax activation function.

Applies the LogSoftmax function to n-dimensional input tensor.

The input is transformed by the Softmax function and then by the log function to lie in range[-inf,0).

Logsoftmax is defined as:

\[\text{logsoftmax}(x_i) = \log \left(\frac{\exp(x_i)}{\sum_{j=0}^{n-1} \exp(x_j)}\right),\]

where \(x_{i}\) is the \(i\)-th slice in the given dimension of the input Tensor.

Parameters

axis (int) – The axis to apply LogSoftmax operation, -1 means the last dimension. Default: -1.

Inputs:
  • x (Tensor) - The input of LogSoftmax.

Outputs:

Tensor, which has the same type and shape as the input as x with values in the range[-inf,0).

Supported Platforms:

Ascend GPU

Examples

>>> input_x = Tensor(np.array([[-1.0, 4.0, -8.0], [2.0, -5.0, 9.0]]), mindspore.float32)
>>> log_softmax = nn.LogSoftmax()
>>> output = log_softmax(input_x)
>>> print(output)
[[-5.00672150e+00 -6.72150636e-03 -1.20067215e+01]
 [-7.00091219e+00 -1.40009127e+01 -9.12250078e-04]]
class tinyms.layers.ReLU[source]

Rectified Linear Unit activation function.

Applies the rectified linear unit function element-wise.

\[\text{ReLU}(x) = (x)^+ = \max(0, x),\]

It returns element-wise \(\max(0, x)\), specially, the neurons with the negative output will be suppressed and the active neurons will stay the same.

The picture about ReLU looks like this ReLU.

Inputs:
  • input_data (Tensor) - The input of ReLU.

Outputs:

Tensor, with the same type and shape as the input_data.

Supported Platforms:

Ascend GPU CPU

Examples

>>> input_x = Tensor(np.array([-1, 2, -3, 2, -1]), mindspore.float16)
>>> relu = nn.ReLU()
>>> output = relu(input_x)
>>> print(output)
[0. 2. 0. 2. 0.]
class tinyms.layers.ReLU6[source]

Compute ReLU6 activation function.

ReLU6 is similar to ReLU with a upper limit of 6, which if the inputs are greater than 6, the outputs will be suppressed to 6. It computes element-wise as

\[\min(\max(0, x), 6).\]

The input is a Tensor of any valid shape.

Inputs:
  • input_data (Tensor) - The input of ReLU6.

Outputs:

Tensor, which has the same type as input_data.

Supported Platforms:

Ascend GPU CPU

Examples

>>> input_x = Tensor(np.array([-1, -2, 0, 2, 1]), mindspore.float16)
>>> relu6 = nn.ReLU6()
>>> output = relu6(input_x)
>>> print(output)
[0. 0. 0. 2. 1.]
class tinyms.layers.Tanh[source]

Tanh activation function.

Applies the Tanh function element-wise, returns a new tensor with the hyperbolic tangent of the elements of input, The input is a Tensor with any valid shape.

Tanh function is defined as:

\[tanh(x_i) = \frac{\exp(x_i) - \exp(-x_i)}{\exp(x_i) + \exp(-x_i)} = \frac{\exp(2x_i) - 1}{\exp(2x_i) + 1},\]

where \(x_i\) is an element of the input Tensor.

Inputs:
  • input_data (Tensor) - The input of Tanh.

Outputs:

Tensor, with the same type and shape as the input_data.

Supported Platforms:

Ascend GPU CPU

Examples

>>> input_x = Tensor(np.array([1, 2, 3, 2, 1]), mindspore.float16)
>>> tanh = nn.Tanh()
>>> output = tanh(input_x)
>>> print(output)
[0.7617 0.964  0.995  0.964  0.7617]
class tinyms.layers.GELU[source]

Gaussian error linear unit activation function.

Applies GELU function to each element of the input. The input is a Tensor with any valid shape.

GELU is defined as:

\[GELU(x_i) = x_i*P(X < x_i),\]

where \(P\) is the cumulative distribution function of standard Gaussian distribution and \(x_i\) is the element of the input.

The picture about GELU looks like this GELU.

Inputs:
  • input_data (Tensor) - The input of GELU.

Outputs:

Tensor, with the same type and shape as the input_data.

Supported Platforms:

Ascend GPU

Examples

>>> input_x = Tensor(np.array([[-1.0, 4.0, -8.0], [2.0, -5.0, 9.0]]), mindspore.float32)
>>> gelu = nn.GELU()
>>> output = gelu(input_x)
>>> print(output)
[[-1.5880802e-01  3.9999299e+00 -3.1077917e-21]
 [ 1.9545976e+00 -2.2918017e-07  9.0000000e+00]]
class tinyms.layers.FastGelu[source]

Fast Gaussian error linear unit activation function.

Applies FastGelu function to each element of the input. The input is a Tensor with any valid shape.

FastGelu is defined as:

\[FastGelu(x_i) = \frac {x_i} {1 + \exp(-1.702 * \left| x_i \right|)} * \exp(0.851 * (x_i - \left| x_i \right|))\]

where \(x_i\) is the element of the input.

Inputs:
  • input_data (Tensor) - The input of FastGelu with data type of float16 or float32.

Outputs:

Tensor, with the same type and shape as the input_data.

Supported Platforms:

Ascend

Examples

>>> input_x = Tensor(np.array([[-1.0, 4.0, -8.0], [2.0, -5.0, 9.0]]), mindspore.float32)
>>> fast_gelu = nn.FastGelu()
>>> output = fast_gelu(input_x)
>>> print(output)
[[-1.5420423e-01  3.9955850e+00 -9.7664279e-06]
 [ 1.9356586e+00 -1.0070159e-03  8.9999981e+00]]
class tinyms.layers.Sigmoid[source]

Sigmoid activation function.

Applies sigmoid-type activation element-wise.

Sigmoid function is defined as:

\[\text{sigmoid}(x_i) = \frac{1}{1 + \exp(-x_i)},\]

where \(x_i\) is the element of the input.

The picture about Sigmoid looks like this Sigmoid.

Inputs:
  • input_data (Tensor) - The input of Tanh.

Outputs:

Tensor, with the same type and shape as the input_data.

Supported Platforms:

Ascend GPU CPU

Examples

>>> input_x = Tensor(np.array([-1, -2, 0, 2, 1]), mindspore.float16)
>>> sigmoid = nn.Sigmoid()
>>> output = sigmoid(input_x)
>>> print(output)
[0.2688  0.11914 0.5     0.881   0.7305 ]
class tinyms.layers.PReLU(channel=1, w=0.25)[source]

PReLU activation function.

Applies the PReLU function element-wise.

PReLU is defined as:

\[prelu(x_i)= \max(0, x_i) + w * \min(0, x_i),\]

where \(x_i\) is an element of an channel of the input.

Here \(w\) is a learnable parameter with a default initial value 0.25. Parameter \(w\) has dimensionality of the argument channel. If called without argument channel, a single parameter \(w\) will be shared across all channels.

The picture about PReLU looks like this PReLU.

Parameters
  • channel (int) – The dimension of input. Default: 1.

  • w (float) – The initial value of w. Default: 0.25.

Inputs:
  • input_data (Tensor) - The input of PReLU.

Outputs:

Tensor, with the same type and shape as the input_data.

Supported Platforms:

Ascend

Examples

>>> input_x = Tensor(np.array([[[[0.1, 0.6], [0.9, 0.9]]]]), mindspore.float32)
>>> prelu = nn.PReLU()
>>> output = prelu(input_x)
>>> print(output)
[[[[0.1 0.6]
   [0.9 0.9]]]]
tinyms.layers.get_activation(name)[source]

Gets the activation function.

Parameters

name (str) – The name of the activation function.

Returns

Function, the activation function.

Examples

>>> sigmoid = nn.get_activation('sigmoid')
class tinyms.layers.LeakyReLU(alpha=0.2)[source]

Leaky ReLU activation function.

LeakyReLU is similar to ReLU, but LeakyReLU has a slope that makes it not equal to 0 at x < 0. The activation function is defined as:

\[\text{leaky_relu}(x) = \begin{cases}x, &\text{if } x \geq 0; \cr \text{alpha} * x, &\text{otherwise.}\end{cases}\]

See https://ai.stanford.edu/~amaas/papers/relu_hybrid_icml2013_final.pdf

Parameters

alpha (Union[int, float]) – Slope of the activation function at x < 0. Default: 0.2.

Inputs:
  • input_x (Tensor) - The input of LeakyReLU.

Outputs:

Tensor, has the same type and shape as the input_x.

Supported Platforms:

Ascend GPU

Examples

>>> input_x = Tensor(np.array([[-1.0, 4.0, -8.0], [2.0, -5.0, 9.0]]), mindspore.float32)
>>> leaky_relu = nn.LeakyReLU()
>>> output = leaky_relu(input_x)
>>> print(output)
[[-0.2  4.  -1.6]
 [ 2.  -1.   9. ]]
class tinyms.layers.HSigmoid[source]

Hard sigmoid activation function.

Applies hard sigmoid activation element-wise. The input is a Tensor with any valid shape.

Hard sigmoid is defined as:

\[\text{hsigmoid}(x_{i}) = max(0, min(1, \frac{x_{i} + 3}{6})),\]

where \(x_{i}\) is the \(i\)-th slice in the given dimension of the input Tensor.

Inputs:
  • input_data (Tensor) - The input of HSigmoid.

Outputs:

Tensor, with the same type and shape as the input_data.

Supported Platforms:

GPU

Examples

>>> input_x = Tensor(np.array([-1, -2, 0, 2, 1]), mindspore.float16)
>>> hsigmoid = nn.HSigmoid()
>>> result = hsigmoid(input_x)
>>> print(result)
[0.3333  0.1666  0.5  0.833  0.6665]
class tinyms.layers.HSwish[source]

Hard swish activation function.

Applies hswish-type activation element-wise. The input is a Tensor with any valid shape.

Hard swish is defined as:

\[\text{hswish}(x_{i}) = x_{i} * \frac{ReLU6(x_{i} + 3)}{6},\]

where \(x_{i}\) is the \(i\)-th slice in the given dimension of the input Tensor.

Inputs:
  • input_data (Tensor) - The input of HSwish.

Outputs:

Tensor, with the same type and shape as the input_data.

Supported Platforms:

GPU

Examples

>>> input_x = Tensor(np.array([-1, -2, 0, 2, 1]), mindspore.float16)
>>> hswish = nn.HSwish()
>>> result = hswish(input_x)
>>> print(result)
[-0.3333  -0.3333  0  1.666  0.6665]
class tinyms.layers.ELU(alpha=1.0)[source]

Exponential Linear Uint activation function.

Applies the exponential linear unit function element-wise. The activation function is defined as:

\[E_{i} = \begin{cases} x, &\text{if } x \geq 0; \cr \text{alpha} * (\exp(x_i) - 1), &\text{otherwise.} \end{cases}\]

The picture about ELU looks like this ELU.

Parameters

alpha (float) – The coefficient of negative factor whose type is float. Default: 1.0.

Inputs:
  • input_data (Tensor) - The input of ELU.

Outputs:

Tensor, with the same type and shape as the input_data.

Supported Platforms:

Ascend GPU

Examples

>>> input_x = Tensor(np.array([-1, -2, 0, 2, 1]), mindspore.float32)
>>> elu = nn.ELU()
>>> result = elu(input_x)
>>> print(result)
[-0.63212055  -0.86466473  0.  2.  1.]
class tinyms.layers.LogSigmoid[source]

Logsigmoid activation function.

Applies logsigmoid activation element-wise. The input is a Tensor with any valid shape.

Logsigmoid is defined as:

\[\text{logsigmoid}(x_{i}) = log(\frac{1}{1 + \exp(-x_i)}),\]

where \(x_{i}\) is the element of the input.

Inputs:
  • input_data (Tensor) - The input of LogSigmoid.

Outputs:

Tensor, with the same type and shape as the input_data.

Supported Platforms:

Ascend GPU

Examples

>>> net = nn.LogSigmoid()
>>> input_x = Tensor(np.array([1.0, 2.0, 3.0]), mindspore.float32)
>>> output = net(input_x)
>>> print(output)
[-0.31326166 -0.12692806 -0.04858734]
class tinyms.layers.BatchNorm1d(num_features, eps=1e-05, momentum=0.9, affine=True, gamma_init='ones', beta_init='zeros', moving_mean_init='zeros', moving_var_init='ones', use_batch_statistics=None)[source]

Batch normalization layer over a 2D input.

Batch Normalization is widely used in convolutional networks. This layer applies Batch Normalization over a 2D input (a mini-batch of 1D inputs) to reduce internal covariate shift as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. It rescales and recenters the feature using a mini-batch of data and the learned parameters which can be described in the following formula.

\[y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]

Note

The implementation of BatchNorm is different in graph mode and pynative mode, therefore the mode is not recommended to be changed after net was initialized.

Parameters
  • num_features (int) – C from an expected input of size (N, C).

  • eps (float) – A value added to the denominator for numerical stability. Default: 1e-5.

  • momentum (float) – A floating hyperparameter of the momentum for the running_mean and running_var computation. Default: 0.9.

  • affine (bool) – A bool value. When set to True, gamma and beta can be learned. Default: True.

  • gamma_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the gamma weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘ones’.

  • beta_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the beta weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘zeros’.

  • moving_mean_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the moving mean. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘zeros’.

  • moving_var_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the moving variance. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘ones’.

  • use_batch_statistics (bool) – If true, use the mean value and variance value of current batch data. If false, use the mean value and variance value of specified value. If None, the training process will use the mean and variance of current batch data and track the running mean and variance, the evaluation process will use the running mean and variance. Default: None.

Inputs:
  • input (Tensor) - Tensor of shape \((N, C_{in})\).

Outputs:

Tensor, the normalized, scaled, offset tensor, of shape \((N, C_{out})\).

Supported Platforms:

Ascend GPU

Examples

>>> net = nn.BatchNorm1d(num_features=4)
>>> np.random.seed(0)
>>> input = Tensor(np.random.randint(0, 255, [2, 4]), mindspore.float32)
>>> output = net(input)
>>> print(output)
[[171.99915   46.999763  116.99941  191.99904 ]
 [ 66.999664 250.99875   194.99902  102.99948 ]]
class tinyms.layers.BatchNorm2d(num_features, eps=1e-05, momentum=0.9, affine=True, gamma_init='ones', beta_init='zeros', moving_mean_init='zeros', moving_var_init='ones', use_batch_statistics=None, data_format='NCHW')[source]

Batch normalization layer over a 4D input.

Batch Normalization is widely used in convolutional networks. This layer applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) to avoid internal covariate shift as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. It rescales and recenters the feature using a mini-batch of data and the learned parameters which can be described in the following formula.

\[y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]

Note

The implementation of BatchNorm is different in graph mode and pynative mode, therefore that mode can not be changed after net was initialized. Note that the formula for updating the running_mean and running_var is \(\hat{x}_\text{new} = (1 - \text{momentum}) \times x_t + \text{momentum} \times \hat{x}\), where \(\hat{x}\) is the estimated statistic and \(x_t\) is the new observed value.

Parameters
  • num_features (int) – C from an expected input of size (N, C, H, W).

  • eps (float) – A value added to the denominator for numerical stability. Default: 1e-5.

  • momentum (float) – A floating hyperparameter of the momentum for the running_mean and running_var computation. Default: 0.9.

  • affine (bool) – A bool value. When set to True, gamma and beta can be learned. Default: True.

  • gamma_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the gamma weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘ones’.

  • beta_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the beta weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘zeros’.

  • moving_mean_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the moving mean. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘zeros’.

  • moving_var_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the moving variance. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘ones’.

  • use_batch_statistics (bool) – If true, use the mean value and variance value of current batch data. If false, use the mean value and variance value of specified value. If None, the training process will use the mean and variance of current batch data and track the running mean and variance, the evaluation process will use the running mean and variance. Default: None.

  • data_format (str) – The optional value for data format, is ‘NHWC’ or ‘NCHW’. Default: ‘NCHW’.

Inputs:
  • input (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor, the normalized, scaled, offset tensor, of shape \((N, C_{out}, H_{out}, W_{out})\).

Supported Platforms:

Ascend GPU CPU

Examples

>>> net = nn.BatchNorm2d(num_features=3)
>>> np.random.seed(0)
>>> input = Tensor(np.random.randint(0, 255, [1, 3, 2, 2]), mindspore.float32)
>>> output = net(input)
>>> print(output)
[[[[171.99915   46.999763 ]
   [116.99941  191.99904  ]]
  [[ 66.999664 250.99875  ]
   [194.99902  102.99948  ]]
  [[  8.999955 210.99895  ]
   [ 20.999895 241.9988   ]]]]
class tinyms.layers.LayerNorm(normalized_shape, begin_norm_axis=-1, begin_params_axis=-1, gamma_init='ones', beta_init='zeros', epsilon=1e-07)[source]

Applies Layer Normalization over a mini-batch of inputs.

Layer normalization is widely used in recurrent neural networks. It applies normalization on a mini-batch of inputs for each single training case as described in the paper Layer Normalization. Unlike batch normalization, layer normalization performs exactly the same computation at training and testing time. It can be described using the following formula. It is applied across all channels and pixel but only one batch size.

\[y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]
Parameters
  • normalized_shape (Union(tuple[int], list[int]) – The normalization is performed over axis begin_norm_axis … R - 1.

  • begin_norm_axis (int) – The first normalization dimension: normalization will be performed along dimensions begin_norm_axis: rank(inputs), the value should be in [-1, rank(input)). Default: -1.

  • begin_params_axis (int) – The first parameter(beta, gamma)dimension: scale and centering parameters will have dimensions begin_params_axis: rank(inputs) and will be broadcast with the normalized inputs accordingly, the value should be in [-1, rank(input)). Default: -1.

  • gamma_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the gamma weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘ones’.

  • beta_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the beta weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘zeros’.

  • epsilon (float) – A value added to the denominator for numerical stability. Default: 1e-7.

Inputs:
  • input_x (Tensor) - The shape of ‘input_x’ is \((x_1, x_2, ..., x_R)\), and input_shape[begin_norm_axis:] is equal to normalized_shape.

Outputs:

Tensor, the normalized and scaled offset tensor, has the same shape and data type as the input_x.

Supported Platforms:

Ascend GPU

Examples

>>> x = Tensor(np.ones([20, 5, 10, 10]), mindspore.float32)
>>> shape1 = x.shape[1:]
>>> m = nn.LayerNorm(shape1,  begin_norm_axis=1, begin_params_axis=1)
>>> output = m(x).shape
>>> print(output)
(20, 5, 10, 10)
extend_repr()[source]

Display instance object as string.

class tinyms.layers.GroupNorm(num_groups, num_channels, eps=1e-05, affine=True, gamma_init='ones', beta_init='zeros')[source]

Group Normalization over a mini-batch of inputs.

Group normalization is widely used in recurrent neural networks. It applies normalization on a mini-batch of inputs for each single training case as described in the paper Group Normalization. Group normalization divides the channels into groups and computes within each group the mean and variance for normalization, and it performs very stable over a wide range of batch size. It can be described using the following formula.

\[y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]
Parameters
  • num_groups (int) – The number of groups to be divided along the channel dimension.

  • num_channels (int) – The number of channels per group.

  • eps (float) – A value added to the denominator for numerical stability. Default: 1e-5.

  • affine (bool) – A bool value, this layer will have learnable affine parameters when set to true. Default: True.

  • gamma_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the gamma weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘ones’. If gamma_init is a Tensor, the shape must be [num_channels].

  • beta_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the beta weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘zeros’. If beta_init is a Tensor, the shape must be [num_channels].

Inputs:
  • input_x (Tensor) - The input feature with shape [N, C, H, W].

Outputs:

Tensor, the normalized and scaled offset tensor, has the same shape and data type as the input_x.

Supported Platforms:

Ascend GPU

Examples

>>> goup_norm_op = nn.GroupNorm(2, 2)
>>> x = Tensor(np.ones([1, 2, 4, 4], np.float32))
>>> output = goup_norm_op(x)
>>> print(output)
[[[[0. 0. 0. 0.]
   [0. 0. 0. 0.]
   [0. 0. 0. 0.]
   [0. 0. 0. 0.]]
  [[0. 0. 0. 0.]
   [0. 0. 0. 0.]
   [0. 0. 0. 0.]
   [0. 0. 0. 0.]]]]
extend_repr()[source]

Display instance object as string.

class tinyms.layers.GlobalBatchNorm(num_features, eps=1e-05, momentum=0.9, affine=True, gamma_init='ones', beta_init='zeros', moving_mean_init='zeros', moving_var_init='ones', use_batch_statistics=None, device_num_each_group=2)[source]

Global normalization layer over a N-dimension input.

Global Normalization is cross device synchronized batch normalization. The implementation of Batch Normalization only normalizes the data within each device. Global normalization will normalize the input within the group. It has been described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. It rescales and recenters the feature using a mini-batch of data and the learned parameters which can be described in the following formula.

\[y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]

Note

Currently, GlobalBatchNorm only supports 2D and 4D inputs.

Parameters
  • num_features (int) – C from an expected input of size (N, C, H, W).

  • device_num_each_group (int) – The number of devices in each group. Default: 2.

  • eps (float) – A value added to the denominator for numerical stability. Default: 1e-5.

  • momentum (float) – A floating hyperparameter of the momentum for the running_mean and running_var computation. Default: 0.9.

  • affine (bool) – A bool value. When set to True, gamma and beta can be learned. Default: True.

  • gamma_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the gamma weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘ones’.

  • beta_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the beta weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘zeros’.

  • moving_mean_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the moving mean. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘zeros’.

  • moving_var_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the moving variance. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘ones’.

  • use_batch_statistics (bool) – If true, use the mean value and variance value of current batch data. If false, use the mean value and variance value of specified value. If None, training process will use the mean and variance of current batch data and track the running mean and variance, eval process will use the running mean and variance. Default: None.

Inputs:
  • input (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor, the normalized, scaled, offset tensor, of shape \((N, C_{out}, H_{out}, W_{out})\).

Supported Platforms:

Ascend

Examples

>>> # This example should be run with multiple processes.
>>> # Please refer to the tutorial > Distributed Training on mindspore.cn.
>>> import numpy as np
>>> from mindspore.communication import init
>>> from mindspore import context
>>> from mindspore.context import ParallelMode
>>> from mindspore import nn, Tensor
>>> from mindspore.common import dtype as mstype
>>>
>>> context.set_context(mode=context.GRAPH_MODE)
>>> init()
>>> context.reset_auto_parallel_context()
>>> context.set_auto_parallel_context(parallel_mode=ParallelMode.DATA_PARALLEL)
>>> np.random.seed(0)
>>> global_bn_op = nn.GlobalBatchNorm(num_features=3, device_num_each_group=2)
>>> input = Tensor(np.random.randint(0, 255, [1, 3, 2, 2]), mstype.float32)
>>> output = global_bn_op(input)
>>> print(output)
[[[[171.99915    46.999763]
   [116.99941   191.99904 ]]
  [[ 66.999664  250.99875 ]
   [194.99902   102.99948 ]]
  [[  8.999955  210.99895 ]
   [ 20.9999895 241.9988  ]]]]
class tinyms.layers.Conv2d(in_channels, out_channels, kernel_size, stride=1, pad_mode='same', padding=0, dilation=1, group=1, has_bias=False, weight_init='normal', bias_init='zeros', data_format='NCHW')[source]

2D convolution layer.

Applies a 2D convolution over an input tensor which is typically of shape \((N, C_{in}, H_{in}, W_{in})\), where \(N\) is batch size, \(C_{in}\) is channel number, and \(H_{in}, W_{in})\) are height and width. For each batch of shape \((C_{in}, H_{in}, W_{in})\), the formula is defined as:

\[out_j = \sum_{i=0}^{C_{in} - 1} ccor(W_{ij}, X_i) + b_j,\]

where \(ccor\) is the cross-correlation operator, \(C_{in}\) is the input channel number, \(j\) ranges from \(0\) to \(C_{out} - 1\), \(W_{ij}\) corresponds to the \(i\)-th channel of the \(j\)-th filter and \(out_{j}\) corresponds to the \(j\)-th channel of the output. \(W_{ij}\) is a slice of kernel and it has shape \((\text{ks_h}, \text{ks_w})\), where \(\text{ks_h}\) and \(\text{ks_w}\) are the height and width of the convolution kernel. The full kernel has shape \((C_{out}, C_{in} // \text{group}, \text{ks_h}, \text{ks_w})\), where group is the group number to split the input in the channel dimension.

If the ‘pad_mode’ is set to be “valid”, the output height and width will be \(\left \lfloor{1 + \frac{H_{in} + 2 \times \text{padding} - \text{ks_h} - (\text{ks_h} - 1) \times (\text{dilation} - 1) }{\text{stride}}} \right \rfloor\) and \(\left \lfloor{1 + \frac{W_{in} + 2 \times \text{padding} - \text{ks_w} - (\text{ks_w} - 1) \times (\text{dilation} - 1) }{\text{stride}}} \right \rfloor\) respectively.

The first introduction can be found in paper Gradient Based Learning Applied to Document Recognition.

Parameters
  • in_channels (int) – The number of input channel \(C_{in}\).

  • out_channels (int) – The number of output channel \(C_{out}\).

  • kernel_size (Union[int, tuple[int]]) – The data type is int or a tuple of 2 integers. Specifies the height and width of the 2D convolution window. Single int means the value is for both the height and the width of the kernel. A tuple of 2 ints means the first value is for the height and the other is for the width of the kernel.

  • stride (Union[int, tuple[int]]) – The distance of kernel moving, an int number that represents the height and width of movement are both strides, or a tuple of two int numbers that represent height and width of movement respectively. Default: 1.

  • pad_mode (str) –

    Specifies padding mode. The optional values are “same”, “valid”, “pad”. Default: “same”.

    • same: Adopts the way of completion. The height and width of the output will be the same as the input. The total number of padding will be calculated in horizontal and vertical directions and evenly distributed to top and bottom, left and right if possible. Otherwise, the last extra padding will be done from the bottom and the right side. If this mode is set, padding must be 0.

    • valid: Adopts the way of discarding. The possible largest height and width of output will be returned without padding. Extra pixels will be discarded. If this mode is set, padding must be 0.

    • pad: Implicit paddings on both sides of the input. The number of padding will be padded to the input Tensor borders. padding must be greater than or equal to 0.

  • padding (Union[int, tuple[int]]) – Implicit paddings on both sides of the input. If padding is one integer, the paddings of top, bottom, left and right are the same, equal to padding. If padding is a tuple with four integers, the paddings of top, bottom, left and right will be equal to padding[0], padding[1], padding[2], and padding[3] accordingly. Default: 0.

  • dilation (Union[int, tuple[int]]) – The data type is int or a tuple of 2 integers. Specifies the dilation rate to use for dilated convolution. If set to be \(k > 1\), there will be \(k - 1\) pixels skipped for each sampling location. Its value must be greater or equal to 1 and bounded by the height and width of the input. Default: 1.

  • group (int) – Splits filter into groups, in_ channels and out_channels must be divisible by the number of groups. If the group is equal to in_channels and out_channels, this 2D convolution layer also can be called 2D depthwise convolution layer. Default: 1.

  • has_bias (bool) – Specifies whether the layer uses a bias vector. Default: False.

  • weight_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the convolution kernel. It can be a Tensor, a string, an Initializer or a number. When a string is specified, values from ‘TruncatedNormal’, ‘Normal’, ‘Uniform’, ‘HeUniform’ and ‘XavierUniform’ distributions as well as constant ‘One’ and ‘Zero’ distributions are possible. Alias ‘xavier_uniform’, ‘he_uniform’, ‘ones’ and ‘zeros’ are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of Initializer for more details. Default: ‘normal’.

  • bias_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the bias vector. Possible Initializer and string are the same as ‘weight_init’. Refer to the values of Initializer for more details. Default: ‘zeros’.

  • data_format (str) – The optional value for data format, is ‘NHWC’ or ‘NCHW’. Default: ‘NCHW’.

Inputs:
  • input (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\) or \((N, H_{in}, W_{in}, C_{in})\).

Outputs:

Tensor of shape \((N, C_{out}, H_{out}, W_{out})\) or \((N, H_{out}, W_{out}, C_{out})\).

Supported Platforms:

Ascend GPU CPU

Examples

>>> net = nn.Conv2d(120, 240, 4, has_bias=False, weight_init='normal')
>>> input = Tensor(np.ones([1, 120, 1024, 640]), mindspore.float32)
>>> output = net(input).shape
>>> print(output)
(1, 240, 1024, 640)
class tinyms.layers.Conv2dTranspose(in_channels, out_channels, kernel_size, stride=1, pad_mode='same', padding=0, dilation=1, group=1, has_bias=False, weight_init='normal', bias_init='zeros')[source]

2D transposed convolution layer.

Compute a 2D transposed convolution, which is also known as a deconvolution (although it is not an actual deconvolution).

Input is typically of shape \((N, C, H, W)\), where \(N\) is batch size and \(C\) is channel number.

If the ‘pad_mode’ is set to be “pad”, the height and width of output are defined as:

\[ \begin{align}\begin{aligned}H_{out} = (H_{in} - 1) \times \text{stride} - 2 \times \text{padding} + \text{dilation} \times (\text{ks_h} - 1) + 1\\W_{out} = (W_{in} - 1) \times \text{stride} - 2 \times \text{padding} + \text{dilation} \times (\text{ks_w} - 1) + 1\end{aligned}\end{align} \]

where \(\text{ks_h}\) is the height of the convolution kernel and \(\text{ks_w}\) is the width of the convolution kernel.

Parameters
  • in_channels (int) – The number of channels in the input space.

  • out_channels (int) – The number of channels in the output space.

  • kernel_size (Union[int, tuple]) – int or a tuple of 2 integers, which specifies the height and width of the 2D convolution window. Single int means the value is for both the height and the width of the kernel. A tuple of 2 ints means the first value is for the height and the other is for the width of the kernel.

  • stride (Union[int, tuple[int]]) – The distance of kernel moving, an int number that represents the height and width of movement are both strides, or a tuple of two int numbers that represent height and width of movement respectively. Its value must be equal to or greater than 1. Default: 1.

  • pad_mode (str) –

    Select the mode of the pad. The optional values are “pad”, “same”, “valid”. Default: “same”.

    • pad: Implicit paddings on both sides of the input.

    • same: Adopted the way of completion.

    • valid: Adopted the way of discarding.

  • padding (Union[int, tuple[int]]) – Implicit paddings on both sides of the input. If padding is one integer, the paddings of top, bottom, left and right are the same, equal to padding. If padding is a tuple with four integers, the paddings of top, bottom, left and right will be equal to padding[0], padding[1], padding[2], and padding[3] accordingly. Default: 0.

  • dilation (Union[int, tuple[int]]) – The data type is int or a tuple of 2 integers. Specifies the dilation rate to use for dilated convolution. If set to be \(k > 1\), there will be \(k - 1\) pixels skipped for each sampling location. Its value must be greater than or equal to 1 and bounded by the height and width of the input. Default: 1.

  • group (int) – Splits filter into groups, in_channels and out_channels must be divisible by the number of groups. This does not support for Davinci devices when group > 1. Default: 1.

  • has_bias (bool) – Specifies whether the layer uses a bias vector. Default: False.

  • weight_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the convolution kernel. It can be a Tensor, a string, an Initializer or a number. When a string is specified, values from ‘TruncatedNormal’, ‘Normal’, ‘Uniform’, ‘HeUniform’ and ‘XavierUniform’ distributions as well as constant ‘One’ and ‘Zero’ distributions are possible. Alias ‘xavier_uniform’, ‘he_uniform’, ‘ones’ and ‘zeros’ are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of Initializer for more details. Default: ‘normal’.

  • bias_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the bias vector. Possible Initializer and string are the same as ‘weight_init’. Refer to the values of Initializer for more details. Default: ‘zeros’.

Inputs:
  • input (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, C_{out}, H_{out}, W_{out})\).

Supported Platforms:

Ascend GPU

Examples

>>> net = nn.Conv2dTranspose(3, 64, 4, has_bias=False, weight_init='normal', pad_mode='pad')
>>> input = Tensor(np.ones([1, 3, 16, 50]), mindspore.float32)
>>> output = net(input).shape
>>> print(output)
(1, 64, 19, 53)
class tinyms.layers.Conv1d(in_channels, out_channels, kernel_size, stride=1, pad_mode='same', padding=0, dilation=1, group=1, has_bias=False, weight_init='normal', bias_init='zeros')[source]

1D convolution layer.

Applies a 1D convolution over an input tensor which is typically of shape \((N, C_{in}, W_{in})\), where \(N\) is batch size and \(C_{in}\) is channel number. For each batch of shape \((C_{in}, W_{in})\), the formula is defined as:

\[out_j = \sum_{i=0}^{C_{in} - 1} ccor(W_{ij}, X_i) + b_j,\]

where \(ccor\) is the cross correlation operator, \(C_{in}\) is the input channel number, \(j\) ranges from \(0\) to \(C_{out} - 1\), \(W_{ij}\) corresponds to the \(i\)-th channel of the \(j\)-th filter and \(out_{j}\) corresponds to the \(j\)-th channel of the output. \(W_{ij}\) is a slice of kernel and it has shape \((\text{ks_w})\), where \(\text{ks_w}\) is the width of the convolution kernel. The full kernel has shape \((C_{out}, C_{in} // \text{group}, \text{ks_w})\), where group is the group number to split the input in the channel dimension.

If the ‘pad_mode’ is set to be “valid”, the output width will be \(\left \lfloor{1 + \frac{W_{in} + 2 \times \text{padding} - \text{ks_w} - (\text{ks_w} - 1) \times (\text{dilation} - 1) }{\text{stride}}} \right \rfloor\) respectively.

The first introduction of convolution layer can be found in paper Gradient Based Learning Applied to Document Recognition.

Parameters
  • in_channels (int) – The number of input channel \(C_{in}\).

  • out_channels (int) – The number of output channel \(C_{out}\).

  • kernel_size (int) – The data type is int. Specifies the width of the 1D convolution window.

  • stride (int) – The distance of kernel moving, an int number that represents the width of movement. Default: 1.

  • pad_mode (str) –

    Specifies padding mode. The optional values are “same”, “valid”, “pad”. Default: “same”.

    • same: Adopts the way of completion. The output width will be the same as the input. The total number of padding will be calculated in the horizontal direction and evenly distributed to left and right if possible. Otherwise, the last extra padding will be done from the bottom and the right side. If this mode is set, padding must be 0.

    • valid: Adopts the way of discarding. The possible largest width of the output will be returned without padding. Extra pixels will be discarded. If this mode is set, padding must be 0.

    • pad: Implicit paddings on both sides of the input. The number of padding will be padded to the input Tensor borders. padding must be greater than or equal to 0.

  • padding (int) – Implicit paddings on both sides of the input. Default: 0.

  • dilation (int) – The data type is int. Specifies the dilation rate to use for dilated convolution. If set to be \(k > 1\), there will be \(k - 1\) pixels skipped for each sampling location. Its value must be greater or equal to 1 and bounded by the height and width of the input. Default: 1.

  • group (int) – Splits filter into groups, in_ channels and out_channels must be divisible by the number of groups. Default: 1.

  • has_bias (bool) – Specifies whether the layer uses a bias vector. Default: False.

  • weight_init (Union[Tensor, str, Initializer, numbers.Number]) – An initializer for the convolution kernel. It can be a Tensor, a string, an Initializer or a number. When a string is specified, values from ‘TruncatedNormal’, ‘Normal’, ‘Uniform’, ‘HeUniform’ and ‘XavierUniform’ distributions as well as constant ‘One’ and ‘Zero’ distributions are possible. Alias ‘xavier_uniform’, ‘he_uniform’, ‘ones’ and ‘zeros’ are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of Initializer for more details. Default: ‘normal’.

  • bias_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the bias vector. Possible Initializer and string are the same as ‘weight_init’. Refer to the values of Initializer for more details. Default: ‘zeros’.

Inputs:
  • input (Tensor) - Tensor of shape \((N, C_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, C_{out}, W_{out})\).

Supported Platforms:

Ascend GPU

Examples

>>> net = nn.Conv1d(120, 240, 4, has_bias=False, weight_init='normal')
>>> input = Tensor(np.ones([1, 120, 640]), mindspore.float32)
>>> output = net(input).shape
>>> print(output)
(1, 240, 640)
class tinyms.layers.Conv1dTranspose(in_channels, out_channels, kernel_size, stride=1, pad_mode='same', padding=0, dilation=1, group=1, has_bias=False, weight_init='normal', bias_init='zeros')[source]

1D transposed convolution layer.

Compute a 1D transposed convolution, which is also known as a deconvolution (although it is not an actual deconvolution).

Input is typically of shape \((N, C, W)\), where \(N\) is batch size and \(C\) is channel number.

If the ‘pad_mode’ is set to be “pad”, the width of output is defined as:

\[W_{out} = (W_{in} - 1) \times \text{stride} - 2 \times \text{padding} + \text{dilation} \times (\text{ks_w} - 1) + 1\]

where \(\text{ks_w}\) is the width of the convolution kernel.

Parameters
  • in_channels (int) – The number of channels in the input space.

  • out_channels (int) – The number of channels in the output space.

  • kernel_size (int) – int, which specifies the width of the 1D convolution window.

  • stride (int) – The distance of kernel moving, an int number that represents the width of movement. Default: 1.

  • pad_mode (str) –

    Select the mode of the pad. The optional values are “pad”, “same”, “valid”. Default: “same”.

    • pad: Implicit paddings on both sides of the input.

    • same: Adopted the way of completion.

    • valid: Adopted the way of discarding.

  • padding (int) – Implicit paddings on both sides of the input. Default: 0.

  • dilation (int) – The data type is int. Specifies the dilation rate to use for dilated convolution. If set to be \(k > 1\), there will be \(k - 1\) pixels skipped for each sampling location. Its value must be greater or equal to 1 and bounded by the width of the input. Default: 1.

  • group (int) – Splits filter into groups, in_channels and out_channels must be divisible by the number of groups. This is not support for Davinci devices when group > 1. Default: 1.

  • has_bias (bool) – Specifies whether the layer uses a bias vector. Default: False.

  • weight_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the convolution kernel. It can be a Tensor, a string, an Initializer or a numbers.Number. When a string is specified, values from ‘TruncatedNormal’, ‘Normal’, ‘Uniform’, ‘HeUniform’ and ‘XavierUniform’ distributions as well as constant ‘One’ and ‘Zero’ distributions are possible. Alias ‘xavier_uniform’, ‘he_uniform’, ‘ones’ and ‘zeros’ are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of Initializer for more details. Default: ‘normal’.

  • bias_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the bias vector. Possible Initializer and string are the same as ‘weight_init’. Refer to the values of Initializer for more details. Default: ‘zeros’.

Inputs:
  • input (Tensor) - Tensor of shape \((N, C_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, C_{out}, W_{out})\).

Supported Platforms:

Ascend GPU

Examples

>>> net = nn.Conv1dTranspose(3, 64, 4, has_bias=False, weight_init='normal', pad_mode='pad')
>>> input = Tensor(np.ones([1, 3, 50]), mindspore.float32)
>>> output = net(input).shape
>>> print(output)
(1, 64, 53)
class tinyms.layers.LSTM(input_size, hidden_size, num_layers=1, has_bias=True, batch_first=False, dropout=0, bidirectional=False)[source]

Stacked LSTM (Long Short-Term Memory) layers.

Apply LSTM layer to the input.

There are two pipelines connecting two consecutive cells in a LSTM model; one is cell state pipeline and the other is hidden state pipeline. Denote two consecutive time nodes as \(t-1\) and \(t\). Given an input \(x_t\) at time \(t\), an hidden state \(h_{t-1}\) and an cell state \(c_{t-1}\) of the layer at time \({t-1}\), the cell state and hidden state at time \(t\) is computed using an gating mechanism. Input gate \(i_t\) is designed to protect the cell from perturbation by irrelevant inputs. Forget gate \(f_t\) affords protection of the cell by forgetting some information in the past, which is stored in \(h_{t-1}\). Output gate \(o_t\) protects other units from perturbation by currently irrelevant memory contents. Candidate cell state \(\tilde{c}_t\) is calculated with the current input, on which the input gate will be applied. Finally, current cell state \(c_{t}\) and hidden state \(h_{t}\) are computed with the calculated gates and cell states. The complete formulation is as follows.

\[\begin{split}\begin{array}{ll} \\ i_t = \sigma(W_{ix} x_t + b_{ix} + W_{ih} h_{(t-1)} + b_{ih}) \\ f_t = \sigma(W_{fx} x_t + b_{fx} + W_{fh} h_{(t-1)} + b_{fh}) \\ \tilde{c}_t = \tanh(W_{cx} x_t + b_{cx} + W_{ch} h_{(t-1)} + b_{ch}) \\ o_t = \sigma(W_{ox} x_t + b_{ox} + W_{oh} h_{(t-1)} + b_{oh}) \\ c_t = f_t * c_{(t-1)} + i_t * \tilde{c}_t \\ h_t = o_t * \tanh(c_t) \\ \end{array}\end{split}\]

Here \(\sigma\) is the sigmoid function, and \(*\) is the Hadamard product. \(W, b\) are learnable weights between the output and the input in the formula. For instance, \(W_{ix}, b_{ix}\) are the weight and bias used to transform from input \(x\) to \(i\). Details can be found in paper LONG SHORT-TERM MEMORY and Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling.

Parameters
  • input_size (int) – Number of features of input.

  • hidden_size (int) – Number of features of hidden layer.

  • num_layers (int) – Number of layers of stacked LSTM . Default: 1.

  • has_bias (bool) – Whether the cell has bias b_ih and b_hh. Default: True.

  • batch_first (bool) – Specifies whether the first dimension of input is batch_size. Default: False.

  • dropout (float, int) – If not 0, append Dropout layer on the outputs of each LSTM layer except the last layer. Default 0. The range of dropout is [0.0, 1.0].

  • bidirectional (bool) – Specifies whether it is a bidirectional LSTM. Default: False.

Inputs:
  • input (Tensor) - Tensor of shape (seq_len, batch_size, input_size) or (batch_size, seq_len, input_size).

  • hx (tuple) - A tuple of two Tensors (h_0, c_0) both of data type mindspore.float32 or mindspore.float16 and shape (num_directions * num_layers, batch_size, hidden_size). Data type of hx must be the same as input.

Outputs:

Tuple, a tuple contains (output, (h_n, c_n)).

  • output (Tensor) - Tensor of shape (seq_len, batch_size, num_directions * hidden_size).

  • hx_n (tuple) - A tuple of two Tensor (h_n, c_n) both of shape (num_directions * num_layers, batch_size, hidden_size).

Supported Platforms:

Ascend GPU

Examples

>>> net = nn.LSTM(10, 16, 2, has_bias=True, batch_first=True, bidirectional=False)
>>> input = Tensor(np.ones([3, 5, 10]).astype(np.float32))
>>> h0 = Tensor(np.ones([1 * 2, 3, 16]).astype(np.float32))
>>> c0 = Tensor(np.ones([1 * 2, 3, 16]).astype(np.float32))
>>> output, (hn, cn) = net(input, (h0, c0))
>>> print(output.shape)
(3, 5, 16)
class tinyms.layers.LSTMCell(input_size, hidden_size, has_bias=True, batch_first=False, dropout=0, bidirectional=False)[source]

LSTM (Long Short-Term Memory) layer.

Apply LSTM layer to the input.

There are two pipelines connecting two consecutive cells in a LSTM model; one is cell state pipeline and the other is hidden state pipeline. Denote two consecutive time nodes as \(t-1\) and \(t\). Given an input \(x_t\) at time \(t\), an hidden state \(h_{t-1}\) and an cell state \(c_{t-1}\) of the layer at time \({t-1}\), the cell state and hidden state at time \(t\) is computed using an gating mechanism. Input gate \(i_t\) is designed to protect the cell from perturbation by irrelevant inputs. Forget gate \(f_t\) affords protection of the cell by forgetting some information in the past, which is stored in \(h_{t-1}\). Output gate \(o_t\) protects other units from perturbation by currently irrelevant memory contents. Candidate cell state \(\tilde{c}_t\) is calculated with the current input, on which the input gate will be applied. Finally, current cell state \(c_{t}\) and hidden state \(h_{t}\) are computed with the calculated gates and cell states. The complete formulation is as follows.

\[\begin{split}\begin{array}{ll} \\ i_t = \sigma(W_{ix} x_t + b_{ix} + W_{ih} h_{(t-1)} + b_{ih}) \\ f_t = \sigma(W_{fx} x_t + b_{fx} + W_{fh} h_{(t-1)} + b_{fh}) \\ \tilde{c}_t = \tanh(W_{cx} x_t + b_{cx} + W_{ch} h_{(t-1)} + b_{ch}) \\ o_t = \sigma(W_{ox} x_t + b_{ox} + W_{oh} h_{(t-1)} + b_{oh}) \\ c_t = f_t * c_{(t-1)} + i_t * \tilde{c}_t \\ h_t = o_t * \tanh(c_t) \\ \end{array}\end{split}\]

Here \(\sigma\) is the sigmoid function, and \(*\) is the Hadamard product. \(W, b\) are learnable weights between the output and the input in the formula. For instance, \(W_{ix}, b_{ix}\) are the weight and bias used to transform from input \(x\) to \(i\). Details can be found in paper LONG SHORT-TERM MEMORY and Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling.

LSTMCell is a single-layer RNN, you can achieve multi-layer RNN by stacking LSTMCell.

Parameters
  • input_size (int) – Number of features of input.

  • hidden_size (int) – Number of features of hidden layer.

  • has_bias (bool) – Whether the cell has bias b_ih and b_hh. Default: True.

  • batch_first (bool) – Specifies whether the first dimension of input is batch_size. Default: False.

  • dropout (float, int) – If not 0, append Dropout layer on the outputs of each LSTM layer except the last layer. Default 0. The range of dropout is [0.0, 1.0].

  • bidirectional (bool) – Specifies whether this is a bidirectional LSTM. If set True, number of directions will be 2 otherwise number of directions is 1. Default: False.

Inputs:
  • input (Tensor) - Tensor of shape (seq_len, batch_size, input_size).

  • h - data type mindspore.float32 or mindspore.float16 and shape (num_directions, batch_size, hidden_size).

  • c - data type mindspore.float32 or mindspore.float16 and shape (num_directions, batch_size, hidden_size). Data type of h’ and ‘c’ must be the same of `input.

  • w - data type mindspore.float32 or mindspore.float16 and shape (weight_size, 1, 1). The value of weight_size depends on input_size, hidden_size and bidirectional

Outputs:

output, h_n, c_n, ‘reserve’, ‘state’.

  • output (Tensor) - Tensor of shape (seq_len, batch_size, num_directions * hidden_size).

  • h - A Tensor with shape (num_directions, batch_size, hidden_size).

  • c - A Tensor with shape (num_directions, batch_size, hidden_size).

  • reserve - reserved

  • state - reserved

Supported Platforms:

GPU CPU

Examples

>>> net = nn.LSTMCell(10, 12, has_bias=True, batch_first=True, bidirectional=False)
>>> input = Tensor(np.ones([3, 5, 10]).astype(np.float32))
>>> h = Tensor(np.ones([1, 3, 12]).astype(np.float32))
>>> c = Tensor(np.ones([1, 3, 12]).astype(np.float32))
>>> w = Tensor(np.ones([1152, 1, 1]).astype(np.float32))
>>> output, h, c, _, _ = net(input, h, c, w)
>>> print(output.shape)
(3, 5, 12)
class tinyms.layers.Dropout(keep_prob=0.5, dtype=mindspore.float32)[source]

Dropout layer for the input.

Randomly set some elements of the input tensor to zero with probability \(1 - keep\_prob\) during training using samples from a Bernoulli distribution.

The outputs are scaled by a factor of \(\frac{1}{keep\_prob}\) during training so that the output layer remains at a similar scale. During inference, this layer returns the same tensor as the input.

This technique is proposed in paper Dropout: A Simple Way to Prevent Neural Networks from Overfitting and proved to be effective to reduce over-fitting and prevents neurons from co-adaptation. See more details in Improving neural networks by preventing co-adaptation of feature detectors.

Note

Each channel will be zeroed out independently on every construct call.

Parameters
  • keep_prob (float) – The keep rate, greater than 0 and less equal than 1. E.g. rate=0.9, dropping out 10% of input units. Default: 0.5.

  • dtype (mindspore.dtype) – Data type of input. Default: mindspore.float32.

Raises

ValueError – If keep_prob is not in range (0, 1].

Inputs:
  • input (Tensor) - The input tensor.

Outputs:

Tensor, output tensor with the same shape as the input.

Supported Platforms:

Ascend GPU CPU

Examples

>>> x = Tensor(np.ones([2, 2, 3]), mindspore.float32)
>>> net = nn.Dropout(keep_prob=0.8)
>>> net.set_train()
Dropout<keep_prob=0.8>
>>> output = net(x)
>>> print(output.shape)
(2, 2, 3)
class tinyms.layers.Flatten[source]

Flatten layer for the input.

Flattens a tensor without changing dimension of batch size on the 0-th axis.

Inputs:
  • input (Tensor) - Tensor of shape \((N, \ldots)\) to be flattened.

Outputs:

Tensor, the shape of the output tensor is \((N, X)\), where \(X\) is the product of the remaining dimensions.

Supported Platforms:

Ascend GPU CPU

Examples

>>> input = Tensor(np.array([[[1.2, 1.2], [2.1, 2.1]], [[2.2, 2.2], [3.2, 3.2]]]), mindspore.float32)
>>> net = nn.Flatten()
>>> output = net(input)
>>> print(output)
[[1.2 1.2 2.1 2.1]
 [2.2 2.2 3.2 3.2]]
class tinyms.layers.Dense(in_channels, out_channels, weight_init='normal', bias_init='zeros', has_bias=True, activation=None)[source]

The dense connected layer.

Applies dense connected layer for the input. This layer implements the operation as:

\[\text{outputs} = \text{activation}(\text{inputs} * \text{kernel} + \text{bias}),\]

where \(\text{activation}\) is the activation function passed as the activation argument (if passed in), \(\text{kernel}\) is a weight matrix with the same data type as the inputs created by the layer, and \(\text{bias}\) is a bias vector with the same data type as the inputs created by the layer (only if has_bias is True).

Parameters
  • in_channels (int) – The number of channels in the input space.

  • out_channels (int) – The number of channels in the output space.

  • weight_init (Union[Tensor, str, Initializer, numbers.Number]) – The trainable weight_init parameter. The dtype is same as input x. The values of str refer to the function initializer. Default: ‘normal’.

  • bias_init (Union[Tensor, str, Initializer, numbers.Number]) – The trainable bias_init parameter. The dtype is same as input x. The values of str refer to the function initializer. Default: ‘zeros’.

  • has_bias (bool) – Specifies whether the layer uses a bias vector. Default: True.

  • activation (Union[str, Cell, Primitive]) – activate function applied to the output of the fully connected layer, eg. ‘ReLU’.Default: None.

Raises

ValueError – If weight_init or bias_init shape is incorrect.

Inputs:
  • input (Tensor) - Tensor of shape \((*, in\_channels)\).

Outputs:

Tensor of shape \((*, out\_channels)\).

Supported Platforms:

Ascend GPU CPU

Examples

>>> input = Tensor(np.array([[180, 234, 154], [244, 48, 247]]), mindspore.float32)
>>> net = nn.Dense(3, 4)
>>> output = net(input)
>>> print(output.shape)
(2, 4)
class tinyms.layers.ClipByNorm(axis=None)[source]

Clips tensor values to a maximum \(L_2\)-norm.

The output of this layer remains the same if the \(L_2\)-norm of the input tensor is not greater than the argument clip_norm. Otherwise the tensor will be normalized as:

\[\text{output}(X) = \frac{\text{clip_norm} * X}{L_2(X)},\]

where \(L_2(X)\) is the \(L_2\)-norm of \(X\).

Parameters

axis (Union[None, int, tuple(int)]) – Compute the L2-norm along the Specific dimension. Default: None, all dimensions to calculate.

Inputs:
  • input (Tensor) - Tensor of shape N-D. The type must be float32 or float16.

  • clip_norm (Tensor) - A scalar Tensor of shape \(()\) or \((1)\). Or a tensor shape can be broadcast to input shape.

Outputs:

Tensor, clipped tensor with the same shape as the input, whose type is float32.

Supported Platforms:

Ascend GPU

Examples

>>> net = nn.ClipByNorm()
>>> input = Tensor(np.random.randint(0, 10, [4, 16]), mindspore.float32)
>>> clip_norm = Tensor(np.array([100]).astype(np.float32))
>>> output = net(input, clip_norm)
>>> print(output.shape)
(4, 16)
class tinyms.layers.Norm(axis=(), keep_dims=False)[source]

Computes the norm of vectors, currently including Euclidean norm, i.e., \(L_2\)-norm.

\[norm(x) = \sqrt{\sum_{i=1}^{n} (x_i^2)}\]
Parameters
  • axis (Union[tuple, int]) – The axis over which to compute vector norms. Default: ().

  • keep_dims (bool) – If true, the axis indicated in axis are kept with size 1. Otherwise, the dimensions in axis are removed from the output shape. Default: False.

Inputs:
  • input (Tensor) - Tensor which is not empty.

Outputs:

Tensor, output tensor with dimensions in ‘axis’ reduced to 1 will be returned if ‘keep_dims’ is True; otherwise a Tensor with dimensions in ‘axis’ removed is returned.

Supported Platforms:

Ascend GPU

Examples

>>> net = nn.Norm(axis=0)
>>> input = Tensor(np.array([[4, 4, 9, 1], [2, 1, 3, 6]]), mindspore.float32)
>>> output = net(input)
>>> print(output)
[4.472136 4.1231055 9.486833 6.0827627]
class tinyms.layers.OneHot(axis=-1, depth=1, on_value=1.0, off_value=0.0, dtype=mindspore.float32)[source]

Returns a one-hot tensor.

The locations represented by indices in argument indices take value on_value, while all other locations take value off_value.

Note

If the input indices is rank \(N\), the output will have rank \(N+1\). The new axis is created at dimension axis.

If indices is a scalar, the output shape will be a vector of length depth.

If indices is a vector of length features, the output shape will be:

features * depth if axis == -1

depth * features if axis == 0

If indices is a matrix with shape [batch, features], the output shape will be:

batch * features * depth if axis == -1

batch * depth * features if axis == 1

depth * batch * features if axis == 0
Parameters
  • axis (int) – Features x depth if axis is -1, depth x features if axis is 0. Default: -1.

  • depth (int) – A scalar defining the depth of the one hot dimension. Default: 1.

  • on_value (float) – A scalar defining the value to fill in output[i][j] when indices[j] = i. Default: 1.0.

  • off_value (float) – A scalar defining the value to fill in output[i][j] when indices[j] != i. Default: 0.0.

  • dtype (mindspore.dtype) – Data type of ‘on_value’ and ‘off_value’, not the data type of indices. Default: mindspore.float32.

Inputs:
  • indices (Tensor) - A tensor of indices of data type mindspore.int32 and arbitrary shape.

Outputs:

Tensor, the one-hot tensor of data type dtype with dimension at axis expanded to depth and filled with on_value and off_value.

Supported Platforms:

Ascend GPU CPU

Examples

>>> net = nn.OneHot(depth=4, axis=1)
>>> indices = Tensor([[1, 3], [0, 2]], dtype=mindspore.int32)
>>> output = net(indices)
>>> print(output)
[[[0. 0.]
  [1. 0.]
  [0. 0.]
  [0. 1.]]
 [[1. 0.]
  [0. 0.]
  [0. 1.]
  [0. 0.]]]
class tinyms.layers.Pad(paddings, mode='CONSTANT')[source]

Pads the input tensor according to the paddings and mode.

Parameters
  • paddings (tuple) –

    The shape of parameter paddings is (N, 2). N is the rank of input data. All elements of paddings are int type. For D th dimension of input, paddings[D, 0] indicates how many sizes to be extended ahead of the D th dimension of the input tensor, and paddings[D, 1] indicates how many sizes to be extended behind of the D th dimension of the input tensor. The padded size of each dimension D of the output is:

    paddings[D, 0] + input_x.dim_size(D) + paddings[D, 1]
    

  • mode (str) – Specifies padding mode. The optional values are “CONSTANT”, “REFLECT”, “SYMMETRIC”. Default: “CONSTANT”.

Inputs:
  • input_x (Tensor) - The input tensor.

Outputs:

Tensor, the tensor after padding.

  • If mode is “CONSTANT”, it fills the edge with 0, regardless of the values of the input_x. If the input_x is [[1,2,3], [4,5,6], [7,8,9]] and paddings is [[1,1], [2,2]], then the Outputs is [[0,0,0,0,0,0,0], [0,0,1,2,3,0,0], [0,0,4,5,6,0,0], [0,0,7,8,9,0,0], [0,0,0,0,0,0,0]].

  • If mode is “REFLECT”, it uses a way of symmetrical copying through the axis of symmetry to fill in. If the input_x is [[1,2,3], [4,5,6], [7,8,9]] and paddings is [[1,1], [2,2]], then the Outputs is [[6,5,4,5,6,5,4], [3,2,1,2,3,2,1], [6,5,4,5,6,5,4], [9,8,7,8,9,8,7], [6,5,4,5,6,5,4]].

  • If mode is “SYMMETRIC”, the filling method is similar to the “REFLECT”. It is also copied according to the symmetry axis, except that it includes the symmetry axis. If the input_x is [[1,2,3], [4,5,6], [7,8,9]] and paddings is [[1,1], [2,2]], then the Outputs is [[2,1,1,2,3,3,2], [2,1,1,2,3,3,2], [5,4,4,5,6,6,5], [8,7,7,8,9,9,8], [8,7,7,8,9,9,8]].

Supported Platforms:

Ascend GPU

Examples

>>> from mindspore import Tensor
>>> from mindspore.ops import operations as P
>>> import mindspore.nn as nn
>>> import numpy as np
>>> class Net(nn.Cell):
...     def __init__(self):
...         super(Net, self).__init__()
...         self.pad = nn.Pad(paddings=((1, 1), (2, 2)), mode="CONSTANT")
...     def construct(self, x):
...         return self.pad(x)
>>> x = np.array([[0.3, 0.5, 0.2], [0.5, 0.7, 0.3]], dtype=np.float32)
>>> pad = Net()
>>> output = pad(Tensor(x))
>>> print(output)
[[0.         0.         0.         0.         0.         0.        0.         ]
 [0.         0.         0.3        0.5        0.2        0.        0.         ]
 [0.         0.         0.5        0.7        0.3        0.        0.         ]
 [0.         0.         0.         0.         0.         0.        0.         ]]
class tinyms.layers.Unfold(ksizes, strides, rates, padding='valid')[source]

Extract patches from images. The input tensor must be a 4-D tensor and the data format is NCHW.

Parameters
  • ksizes (Union[tuple[int], list[int]]) – The size of sliding window, must be a tuple or a list of integers, and the format is [1, ksize_row, ksize_col, 1].

  • strides (Union[tuple[int], list[int]]) – Distance between the centers of the two consecutive patches, must be a tuple or list of int, and the format is [1, stride_row, stride_col, 1].

  • rates (Union[tuple[int], list[int]]) – In each extracted patch, the gap between the corresponding dimension pixel positions, must be a tuple or a list of integers, and the format is [1, rate_row, rate_col, 1].

  • padding (str) –

    The type of padding algorithm, is a string whose value is “same” or “valid”, not case sensitive. Default: “valid”.

    • same: Means that the patch can take the part beyond the original image, and this part is filled with 0.

    • valid: Means that the taken patch area must be completely covered in the original image.

Inputs:
  • input_x (Tensor) - A 4-D tensor whose shape is [in_batch, in_depth, in_row, in_col] and data type is number.

Outputs:

Tensor, a 4-D tensor whose data type is same as input_x, and the shape is [out_batch, out_depth, out_row, out_col] where out_batch is the same as the in_batch.

out_depth = ksize_row * ksize_col * in_depth

out_row = (in_row - (ksize_row + (ksize_row - 1) * (rate_row - 1))) // stride_row + 1

out_col = (in_col - (ksize_col + (ksize_col - 1) * (rate_col - 1))) // stride_col + 1
Supported Platforms:

Ascend

Examples

>>> net = Unfold(ksizes=[1, 2, 2, 1], strides=[1, 2, 2, 1], rates=[1, 2, 2, 1])
>>> image = Tensor(np.ones([2, 3, 6, 6]), dtype=mstype.float16)
>>> output = net(image)
>>> print(output.shape)
(2, 12, 2, 2)
class tinyms.layers.Tril[source]

Returns a tensor with elements above the kth diagonal zeroed.

Inputs:
  • x (Tensor) - The input tensor.

  • k (Int) - The index of diagonal. Default: 0

Outputs:

Tensor, has the same type as input x.

Supported Platforms:

Ascend GPU CPU

Examples

>>> x = Tensor(np.array([[1, 2], [3, 4]]))
>>> tril = nn.Tril()
>>> result = tril(x)
>>> print(result)
[[1   0]
 [3   4]]
class tinyms.layers.Triu[source]

Returns a tensor with elements below the kth diagonal zeroed.

Inputs:
  • x (Tensor) - The input tensor.

  • k (Int) - The index of diagonal. Default: 0

Outputs:

Tensor, has the same type as input x.

Supported Platforms:

Ascend GPU CPU

Examples

>>> x = Tensor(np.array([[1, 2], [3, 4]]))
>>> triu = nn.Triu()
>>> result = triu(x)
>>> print(result)
[[1 2]
 [0 4]]
class tinyms.layers.ResizeBilinear[source]

Samples the input tensor to the given size or scale_factor by using bilinear interpolate.

Inputs:
  • x (Tensor) - Tensor to be resized. Input tensor must be a 4-D tensor with shape: math:(batch, channels, height, width), with data type of float16 or float32.

  • size (Union[tuple[int], list[int]]): A tuple or list of 2 int elements ‘(new_height, new_width)’, the new size of the tensor. One and only one of size and scale_factor can be set to None. Default: None.

  • scale_factor (int): The scale factor of new size of the tensor. The value should be positive integer. One and only one of size and scale_factor can be set to None. Default: None.

  • align_corners (bool): If true, rescale input by ‘(new_height - 1) / (height - 1)’, which exactly aligns the 4 corners of images and resized images. If false, rescale by ‘new_height / height’. Default: False.

Outputs:

Resized tensor. If size is set, the result is 4-D tensor with shape:math:(batch, channels, new_height, new_width) in float32. If scale is set, the result is 4-D tensor with shape:math:(batch, channels, scale_factor * height, scale_factor * width) in float32

Supported Platforms:

Ascend

Examples

>>> tensor = Tensor([[[[1, 2, 3, 4], [5, 6, 7, 8]]]], mindspore.float32)
>>> resize_bilinear = nn.ResizeBilinear()
>>> result = resize_bilinear(tensor, size=(5,5))
>>> print(result.shape)
(1, 1, 5, 5)
class tinyms.layers.MatrixDiag[source]

Returns a batched diagonal tensor with a given batched diagonal values.

Assume x has \(k\) dimensions \([I, J, K, ..., N]\), then the output is a tensor of rank \(k+1\) with dimensions \([I, J, K, ..., N, N]\) where:

output[i, j, k, ..., m, n] = 1{m=n} * x[i, j, k, ..., n]
Inputs:
  • x (Tensor) - The diagonal values. It can be one of the following data types: float32, float16, int32, int8, and uint8.

Outputs:

Tensor, has the same type as input x. The shape must be x.shape + (x.shape[-1], ).

Supported Platforms:

Ascend

Examples

>>> x = Tensor(np.array([1, -1]), mstype.float32)
>>> matrix_diag = nn.MatrixDiag()
>>> output = matrix_diag(x)
>>> print(output)
[[ 1.  0.]
 [ 0. -1.]]
class tinyms.layers.MatrixDiagPart[source]

Returns the batched diagonal part of a batched tensor.

Assume x has \(k\) dimensions \([I, J, K, ..., M, N]\), then the output is a tensor of rank \(k-1\) with dimensions \([I, J, K, ..., min(M, N)]\) where:

output[i, j, k, ..., n] = x[i, j, k, ..., n, n]
Inputs:
  • x (Tensor) - The batched tensor. It can be one of the following data types: float32, float16, int32, int8, and uint8.

Outputs:

Tensor, has the same type as input x. The shape must be x.shape[:-2] + [min(x.shape[-2:])].

Supported Platforms:

Ascend

Examples

>>> x = Tensor([[[-1, 0], [0, 1]], [[-1, 0], [0, 1]], [[-1, 0], [0, 1]]], mindspore.float32)
>>> matrix_diag_part = nn.MatrixDiagPart()
>>> output = matrix_diag_part(x)
>>> print(output)
[[-1.  1.]
 [-1.  1.]
 [-1.  1.]]
class tinyms.layers.MatrixSetDiag[source]

Modifies the batched diagonal part of a batched tensor.

Assume x has \(k+1\) dimensions \([I, J, K, ..., M, N]\) and diagonal has \(k\) dimensions \([I, J, K, ..., min(M, N)]\). Then the output is a tensor of rank \(k+1\) with dimensions \([I, J, K, ..., M, N]\) where:

output[i, j, k, ..., m, n] = diagnoal[i, j, k, ..., n] for m == n

output[i, j, k, ..., m, n] = x[i, j, k, ..., m, n] for m != n
Inputs:
  • x (Tensor) - The batched tensor. Rank k+1, where k >= 1. It can be one of the following data types: float32, float16, int32, int8, and uint8.

  • diagonal (Tensor) - The diagonal values. Must have the same type as input x. Rank k, where k >= 1.

Outputs:

Tensor, has the same type and shape as input x.

Supported Platforms:

Ascend

Examples

>>> x = Tensor([[[-1, 0], [0, 1]], [[-1, 0], [0, 1]], [[-1, 0], [0, 1]]], mindspore.float32)
>>> diagonal = Tensor([[-1., 2.], [-1., 1.], [-1., 1.]], mindspore.float32)
>>> matrix_set_diag = nn.MatrixSetDiag()
>>> output = matrix_set_diag(x, diagonal)
>>> print(output)
[[[-1.  0.]
  [ 0.  2.]]
 [[-1.  0.]
  [ 0.  1.]]
 [[-1.  0.]
  [ 0.  1.]]]
class tinyms.layers.L1Regularizer(scale)[source]

Apply l1 regularization to weights

l1 regularization makes weights sparsity

Note

scale(regularization factor) should be a number which greater than 0

Parameters

scale (int, float) – l1 regularization factor which greater than 0.

Raises

ValueError – If scale(regularization factor) is not greater than 0. If scale(regularization factor) is math.inf or math.nan.

Inputs:
  • weights (Tensor) - The input tensor

Outputs:

Tensor, which dtype is higher precision data type between mindspore.float32 and weights dtype, and Tensor shape is ()

Supported Platforms:

Ascend GPU CPU

Examples

>>> scale = 0.5
>>> net = nn.L1Regularizer(scale)
>>> weights = Tensor(np.array([[1.0, -2.0], [-3.0, 4.0]]).astype(np.float32))
>>> output = net(weights)
>>> print(output.asnumpy())
5.0
class tinyms.layers.Embedding(vocab_size, embedding_size, use_one_hot=False, embedding_table='normal', dtype=mindspore.float32, padding_idx=None)[source]

A simple lookup table that stores embeddings of a fixed dictionary and size.

This module is often used to store word embeddings and retrieve them using indices. The input to the module is a list of indices, and the output is the corresponding word embeddings.

Note

When ‘use_one_hot’ is set to True, the type of the input must be mindspore.int32.

Parameters
  • vocab_size (int) – Size of the dictionary of embeddings.

  • embedding_size (int) – The size of each embedding vector.

  • use_one_hot (bool) – Specifies whether to apply one_hot encoding form. Default: False.

  • embedding_table (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the embedding_table. Refer to class initializer for the values of string when a string is specified. Default: ‘normal’.

  • dtype (mindspore.dtype) – Data type of input. Default: mindspore.float32.

  • padding_idx (int, None) – When the padding_idx encounters index, the output embedding vector of this index will be initialized to zero. Default: None. The feature is inactivated.

Inputs:
  • input (Tensor) - Tensor of shape \((\text{batch_size}, \text{input_length})\). The elements of the Tensor must be integer and not larger than vocab_size. Otherwise the corresponding embedding vector will be zero.

Outputs:

Tensor of shape \((\text{batch_size}, \text{input_length}, \text{embedding_size})\).

Supported Platforms:

Ascend GPU

Examples

>>> net = nn.Embedding(20000, 768,  True)
>>> input_data = Tensor(np.ones([8, 128]), mindspore.int32)
>>>
>>> # Maps the input word IDs to word embedding.
>>> output = net(input_data)
>>> result = output.shape
>>> print(result)
(8, 128, 768)
class tinyms.layers.EmbeddingLookup(vocab_size, embedding_size, param_init='normal', target='CPU', slice_mode='batch_slice', manual_shapes=None, max_norm=None, sparse=True, vocab_cache_size=0)[source]

Returns a slice of the input tensor based on the specified indices.

Note

When ‘target’ is set to ‘CPU’, this module will use P.EmbeddingLookup().add_prim_attr(‘primitive_target’, ‘CPU’) which specified ‘offset = 0’ to lookup table. When ‘target’ is set to ‘DEVICE’, this module will use P.Gather() which specified ‘axis = 0’ to lookup table. In field slice mode, the manual_shapes must be given. It is a tuple ,where the element is vocab[i], vocab[i] is the row numbers for i-th part.

Parameters
  • vocab_size (int) – Size of the dictionary of embeddings.

  • embedding_size (int) – The size of each embedding vector.

  • param_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the embedding_table. Refer to class initializer for the values of string when a string is specified. Default: ‘normal’.

  • target (str) – Specifies the target where the op is executed. The value must in [‘DEVICE’, ‘CPU’]. Default: ‘CPU’.

  • slice_mode (str) – The slicing way in semi_auto_parallel/auto_parallel. The value must get through nn.EmbeddingLookup. Default: nn.EmbeddingLookup.BATCH_SLICE.

  • manual_shapes (tuple) – The accompaniment array in field slice mode.

  • max_norm (Union[float, None]) – A maximum clipping value. The data type must be float16, float32 or None. Default: None

  • sparse (bool) – Using sparse mode. When ‘target’ is set to ‘CPU’, ‘sparse’ has to be true. Default: True.

  • vocab_cache_size (int) – Cache size of the dictionary of embeddings. Default: 0. It is valid only in parameter server trainning mode and ‘DEVICE’ target. And the moment parameter of corresponding optimizer will also be set to the cache size. In addition, it should be noted that it will cost the ‘DEVICE’ memory, so suggests setting a reasonable value to avoid insufficient memory.

Inputs:
  • input_indices (Tensor) - The shape of tensor is \((y_1, y_2, ..., y_S)\). Specifies the indices of elements of the original Tensor. Values can be out of range of embedding_table, and the exceeding part will be filled with 0 in the output. Values does not support negative and the result is undefined if values are negative. Input_indices must only be a 2d tensor in this interface when run in semi auto parallel/auto parallel mode.

Outputs:

Tensor, the shape of tensor is \((z_1, z_2, ..., z_N)\).

Supported Platforms:

Ascend CPU

Examples

>>> input_indices = Tensor(np.array([[1, 0], [3, 2]]), mindspore.int32)
>>> result = nn.EmbeddingLookup(4,2)(input_indices)
>>> print(result.shape)
(2, 2, 2)
class tinyms.layers.MultiFieldEmbeddingLookup(vocab_size, embedding_size, field_size, param_init='normal', target='CPU', slice_mode='batch_slice', feature_num_list=None, max_norm=None, sparse=True, operator='SUM')[source]

Returns a slice of input tensor based on the specified indices and the field ids. This operation supports looking up embeddings using multi hot and one hot fields simultaneously.

Note

When ‘target’ is set to ‘CPU’, this module will use P.EmbeddingLookup().add_prim_attr(‘primitive_target’, ‘CPU’) which specified ‘offset = 0’ to lookup table. When ‘target’ is set to ‘DEVICE’, this module will use P.Gather() which specified ‘axis = 0’ to lookup table. The vectors with the same field_ids will be combined by the ‘operator’, such as ‘SUM’, ‘MAX’ and ‘MEAN’. Ensure the input_values of the padded id is zero, so that they can be ignored. The final output will be zeros if the sum of absolute weight of the field is zero. This class only supports [‘table_row_slice’, ‘batch_slice’ and ‘table_column_slice’]. For the operation ‘MAX’ on device Ascend, there is a constrain where batch_size * (seq_length + field_size) < 3500.

Parameters
  • vocab_size (int) – The size of the dictionary of embeddings.

  • embedding_size (int) – The size of each embedding vector.

  • field_size (int) – The field size of the final outputs.

  • param_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the embedding_table. Refer to class initializer for the values of string when a string is specified. Default: ‘normal’.

  • target (str) – Specifies the target where the op is executed. The value must in [‘DEVICE’, ‘CPU’]. Default: ‘CPU’.

  • slice_mode (str) – The slicing way in semi_auto_parallel/auto_parallel. The value must get through nn.EmbeddingLookup. Default: nn.EmbeddingLookup.BATCH_SLICE.

  • feature_num_list (tuple) – The accompaniment array in field slice mode. This is unused currently.

  • max_norm (Union[float, None]) – A maximum clipping value. The data type must be float16, float32 or None. Default: None

  • sparse (bool) – Using sparse mode. When ‘target’ is set to ‘CPU’, ‘sparse’ has to be true. Default: True.

  • operator (string) – The pooling method for the features in one field. Support ‘SUM, ‘MEAN’ and ‘MAX’

Inputs:
  • input_indices (Tensor) - The shape of tensor is \((batch\_size, seq\_length)\). Specifies the indices of elements of the original Tensor. Input_indices must be a 2d tensor in this interface. Type is Int32, Int64.

  • input_values (Tensor) - The shape of tensor is \((batch\_size, seq\_length)\). Specifies the weights of elements of the input_indices. The lookout vector will multiply with the input_values. Type is Float32.

  • field_ids (Tensor) - The shape of tensor is \((batch\_size, seq\_length)\). Specifies the field id of elements of the input_indices. Type is Int32.

Outputs:

Tensor, the shape of tensor is \((batch\_size, field\_size, embedding\_size)\). Type is Float32.

Supported Platforms:

Ascend GPU

Examples

>>> input_indices = Tensor([[2, 4, 6, 0, 0], [1, 3, 5, 0, 0]], mindspore.int32)
>>> input_values = Tensor([[1, 1, 1, 0, 0], [1, 1, 1, 0, 0]], mindspore.float32)
>>> field_ids = Tensor([[0, 1, 1, 0, 0], [0, 0, 1, 0, 0]], mindspore.int32)
>>> net = nn.MultiFieldEmbeddingLookup(10, 2, field_size=2, operator='SUM')
>>> out = net(input_indices, input_values, field_ids)
>>> print(out.shape)
(2, 2, 2)
class tinyms.layers.AvgPool2d(kernel_size=1, stride=1, pad_mode='valid', data_format='NCHW')[source]

2D average pooling for temporal data.

Applies a 2D average pooling over an input Tensor which can be regarded as a composition of 2D input planes.

Typically the input is of shape \((N_{in}, C_{in}, H_{in}, W_{in})\), AvgPool2d outputs regional average in the \((H_{in}, W_{in})\)-dimension. Given kernel size \(ks = (h_{ker}, w_{ker})\) and stride \(s = (s_0, s_1)\), the operation is as follows.

\[\text{output}(N_i, C_j, h, w) = \frac{1}{h_{ker} * w_{ker}} \sum_{m=0}^{h_{ker}-1} \sum_{n=0}^{w_{ker}-1} \text{input}(N_i, C_j, s_0 \times h + m, s_1 \times w + n)\]

Note

pad_mode for training only supports “same” and “valid”.

Parameters
  • kernel_size (Union[int, tuple[int]]) – The size of kernel used to take the average value. The data type of kernel_size must be int and the value represents the height and width, or a tuple of two int numbers that represent height and width respectively. Default: 1.

  • stride (Union[int, tuple[int]]) – The distance of kernel moving, an int number that represents the height and width of movement are both strides, or a tuple of two int numbers that represent height and width of movement respectively. Default: 1.

  • pad_mode (str) –

    The optional value for pad mode, is “same” or “valid”, not case sensitive. Default: “valid”.

    • same: Adopts the way of completion. The height and width of the output will be the same as the input. The total number of padding will be calculated in horizontal and vertical directions and evenly distributed to top and bottom, left and right if possible. Otherwise, the last extra padding will be done from the bottom and the right side.

    • valid: Adopts the way of discarding. The possible largest height and width of output will be returned without padding. Extra pixels will be discarded.

  • data_format (str) – The optional value for data format, is ‘NHWC’ or ‘NCHW’. Default: ‘NCHW’.

Inputs:
  • input (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, C_{out}, H_{out}, W_{out})\).

Supported Platforms:

Ascend GPU

Examples

>>> pool = nn.AvgPool2d(kernel_size=3, stride=1)
>>> x = Tensor(np.random.randint(0, 10, [1, 2, 4, 4]), mindspore.float32)
>>> output = pool(x)
>>> print(output.shape)
(1, 2, 2, 2)
class tinyms.layers.MaxPool2d(kernel_size=1, stride=1, pad_mode='valid', data_format='NCHW')[source]

2D max pooling operation for temporal data.

Applies a 2D max pooling over an input Tensor which can be regarded as a composition of 2D planes.

Typically the input is of shape \((N_{in}, C_{in}, H_{in}, W_{in})\), MaxPool2d outputs regional maximum in the \((H_{in}, W_{in})\)-dimension. Given kernel size \(ks = (h_{ker}, w_{ker})\) and stride \(s = (s_0, s_1)\), the operation is as follows.

\[\text{output}(N_i, C_j, h, w) = \max_{m=0, \ldots, h_{ker}-1} \max_{n=0, \ldots, w_{ker}-1} \text{input}(N_i, C_j, s_0 \times h + m, s_1 \times w + n)\]

Note

pad_mode for training only supports “same” and “valid”.

Parameters
  • kernel_size (Union[int, tuple[int]]) – The size of kernel used to take the max value, is an int number that represents height and width are both kernel_size, or a tuple of two int numbers that represent height and width respectively. Default: 1.

  • stride (Union[int, tuple[int]]) – The distance of kernel moving, an int number that represents the height and width of movement are both strides, or a tuple of two int numbers that represent height and width of movement respectively. Default: 1.

  • pad_mode (str) –

    The optional value for pad mode, is “same” or “valid”, not case sensitive. Default: “valid”.

    • same: Adopts the way of completion. The height and width of the output will be the same as the input. The total number of padding will be calculated in horizontal and vertical directions and evenly distributed to top and bottom, left and right if possible. Otherwise, the last extra padding will be done from the bottom and the right side.

    • valid: Adopts the way of discarding. The possible largest height and width of output will be returned without padding. Extra pixels will be discarded.

  • data_format (str) – The optional value for data format, is ‘NHWC’ or ‘NCHW’. Default: ‘NCHW’.

Inputs:
  • input (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, C_{out}, H_{out}, W_{out})\).

Supported Platforms:

Ascend GPU CPU

Examples

>>> pool = nn.MaxPool2d(kernel_size=3, stride=1)
>>> x = Tensor(np.random.randint(0, 10, [1, 2, 4, 4]), mindspore.float32)
>>> output = pool(x)
>>> print(output.shape)
(1, 2, 2, 2)
class tinyms.layers.AvgPool1d(kernel_size=1, stride=1, pad_mode='valid')[source]

1D average pooling for temporal data.

Applies a 1D average pooling over an input Tensor which can be regarded as a composition of 1D input planes.

Typically the input is of shape \((N_{in}, C_{in}, L_{in})\), AvgPool1d outputs regional average in the \((L_{in})\)-dimension. Given kernel size \(ks = l_{ker}\) and stride \(s = s_0\), the operation is as follows.

\[\text{output}(N_i, C_j, l) = \frac{1}{l_{ker}} \sum_{n=0}^{l_{ker}-1} \text{input}(N_i, C_j, s_0 \times l + n)\]

Note

pad_mode for training only supports “same” and “valid”.

Parameters
  • kernel_size (int) – The size of kernel window used to take the average value, Default: 1.

  • stride (int) – The distance of kernel moving, an int number that represents the width of movement is strides, Default: 1.

  • pad_mode (str) –

    The optional value for pad mode, is “same” or “valid”, not case sensitive. Default: “valid”.

    • same: Adopts the way of completion. The height and width of the output will be the same as the input. The total number of padding will be calculated in horizontal and vertical directions and evenly distributed to top and bottom, left and right if possible. Otherwise, the last extra padding will be done from the bottom and the right side.

    • valid: Adopts the way of discarding. The possible largest height and width of output will be returned without padding. Extra pixels will be discarded.

Inputs:
  • input (Tensor) - Tensor of shape \((N, C_{in}, L_{in})\).

Outputs:

Tensor of shape \((N, C_{out}, L_{out})\).

Supported Platforms:

Ascend

Examples

>>> pool = nn.AvgPool1d(kernel_size=6, stride=1)
>>> x = Tensor(np.random.randint(0, 10, [1, 3, 6]), mindspore.float32)
>>> output = pool(x)
>>> result = output.shape
>>> print(result)
(1, 3, 1)
class tinyms.layers.MaxPool1d(kernel_size=1, stride=1, pad_mode='valid')[source]

1D max pooling operation for temporal data.

Applies a 1D max pooling over an input Tensor which can be regarded as a composition of 1D planes.

Typically the input is of shape \((N_{in}, C_{in}, L_{in})\), MaxPool1d outputs regional maximum in the \((L_{in})\)-dimension. Given kernel size \(ks = (l_{ker})\) and stride \(s = (s_0)\), the operation is as follows.

\[\text{output}(N_i, C_j, l) = \max_{n=0, \ldots, l_{ker}-1} \text{input}(N_i, C_j, s_0 \times l + n)\]

Note

pad_mode for training only supports “same” and “valid”.

Parameters
  • kernel_size (int) – The size of kernel used to take the max value, Default: 1.

  • stride (int) – The distance of kernel moving, an int number that represents the width of movement is stride, Default: 1.

  • pad_mode (str) –

    The optional value for pad mode, is “same” or “valid”, not case sensitive. Default: “valid”.

    • same: Adopts the way of completion. The total number of padding will be calculated in horizontal and vertical directions and evenly distributed to top and bottom, left and right if possible. Otherwise, the last extra padding will be done from the bottom and the right side.

    • valid: Adopts the way of discarding. The possible largest height and width of output will be returned without padding. Extra pixels will be discarded.

Inputs:
  • input (Tensor) - Tensor of shape \((N, C, L_{in})\).

Outputs:

Tensor of shape \((N, C, L_{out}))\).

Supported Platforms:

Ascend

Examples

>>> max_pool = nn.MaxPool1d(kernel_size=3, stride=1)
>>> x = Tensor(np.random.randint(0, 10, [1, 2, 4]), mindspore.float32)
>>> output = max_pool(x)
>>> result = output.shape
>>> print(result)
(1, 2, 2)
class tinyms.layers.ReduceLogSumExp(axis, keep_dims=False)[source]

Reduces a dimension of a tensor by calculating exponential for all elements in the dimension, then calculate logarithm of the sum.

The dtype of the tensor to be reduced is number.

\[ReduceLogSumExp(x) = \log(\sum(e^x))\]
Parameters
  • axis (Union[int, tuple(int), list(int)]) – (), reduce all dimensions. Only constant value is allowed.

  • keep_dims (bool) – If True, keep these reduced dimensions and the length is 1. If False, don’t keep these dimensions. Default : False.

Inputs:
  • x (Tensor) - The input tensor. With float16 or float32 data type.

Outputs:

Tensor, has the same dtype as the x.

  • If axis is (), and keep_dims is False, the output is a 0-D tensor representing the sum of all elements in the input tensor.

  • If axis is int, set as 2, and keep_dims is False, the shape of output is \((x_1, x_3, ..., x_R)\).

  • If axis is tuple(int), set as (2, 3), and keep_dims is False, the shape of output is \((x_1, x_4, ..., x_R)\).

Supported Platforms:

Ascend GPU

Examples

>>> input_x = Tensor(np.random.randn(3, 4, 5, 6).astype(np.float32))
>>> op = nn.ReduceLogSumExp(1, keep_dims=True)
>>> output = op(input_x)
>>> print(output.shape)
(3, 1, 5, 6)
class tinyms.layers.Range(start, limit=None, delta=1)[source]

Creates a sequence of numbers in range [start, limit) with step size delta.

The size of output is \(\left \lfloor \frac{limit-start}{delta} \right \rfloor + 1\) and delta is the gap between two values in the tensor.

\[out_{i+1} = out_{i} +delta\]
Parameters
  • start (Union[int, float]) – If limit is None, the value acts as limit in the range and first entry defaults to 0. Otherwise, it acts as first entry in the range.

  • limit (Union[int, float]) – Acts as upper limit of sequence. If None, defaults to the value of start while set the first entry of the range to 0. It can not be equal to start.

  • delta (Union[int, float]) – Increment of the range. It can not be equal to zero. Default: 1.

Outputs:

Tensor, the dtype is int if the dtype of start, limit and delta all are int. Otherwise, dtype is float.

Supported Platforms:

Ascend

Examples

>>> net = nn.Range(1, 8, 2)
>>> output = net()
>>> print(output)
[1 3 5 7]
class tinyms.layers.LGamma[source]

Calculates LGamma using Lanczos’ approximation referring to “A Precision Approximation of the Gamma Function”. The algorithm is:

\[\begin{split}\begin{array}{ll} \\ lgamma(z + 1) = \frac{(\log(2) + \log(pi))}{2} + (z + 1/2) * log(t(z)) - t(z) + A(z) \\ t(z) = z + kLanczosGamma + 1/2 \\ A(z) = kBaseLanczosCoeff + \sum_{k=1}^n \frac{kLanczosCoefficients[i]}{z + k} \end{array}\end{split}\]

However, if the input is less than 0.5 use Euler’s reflection formula:

\[lgamma(x) = \log(pi) - lgamma(1-x) - \log(abs(sin(pi * x)))\]

And please note that

\[lgamma(+/-inf) = +inf\]

Thus, the behaviour of LGamma follows: when x > 0.5, return log(Gamma(x)) when x < 0.5 and is not an interger, return the real part of Log(Gamma(x)) where Log is the complex logarithm when x is an integer less or equal to 0, return +inf when x = +/- inf, return +inf

Supported Platforms:

Ascend GPU

Inputs:
  • x (Tensor) - The input tensor. Only float16, float32 are supported.

Outputs:

Tensor, has the same shape and dtype as the x.

Examples

>>> input_x = Tensor(np.array([2, 3, 4]).astype(np.float32))
>>> op = nn.LGamma()
>>> output = op(input_x)
>>> print(output)
[3.5762787e-07 6.9314754e-01 1.7917603e+00]
class tinyms.layers.DiGamma[source]

Calculates Digamma using Lanczos’ approximation referring to “A Precision Approximation of the Gamma Function”. The algorithm is:

\[ \begin{align}\begin{aligned}digamma(z + 1) = log(t(z)) + A'(z) / A(z) - kLanczosGamma / t(z)\\t(z) = z + kLanczosGamma + 1/2\\A(z) = kBaseLanczosCoeff + \sum_{k=1}^n \frac{kLanczosCoefficients[i]}{z + k}\\A'(z) = \sum_{k=1}^n \frac{kLanczosCoefficients[i]}{{z + k}^2}\end{aligned}\end{align} \]

However, if the input is less than 0.5 use Euler’s reflection formula:

\[digamma(x) = digamma(1 - x) - pi * cot(pi * x)\]
Inputs:
  • x (Tensor[Number]) - The input tensor. Only float16, float32 are supported.

Outputs:

Tensor, has the same shape and dtype as the x.

Supported Platforms:

Ascend GPU

Examples

>>> input_x = Tensor(np.array([2, 3, 4]).astype(np.float32))
>>> op = nn.DiGamma()
>>> output = op(input_x)
>>> print(output)
[0.42278463  0.92278427 1.2561178]
class tinyms.layers.IGamma[source]

Calculates lower regularized incomplete Gamma function. The lower regularized incomplete Gamma function is defined as:

\[P(a, x) = gamma(a, x) / Gamma(a) = 1 - Q(a, x)\]

where

\[gamma(a, x) = \int_0^x t^{a-1} \exp^{-t} dt\]

is the lower incomplete Gamma function.

Above \(Q(a, x)\) is the upper regularized complete Gamma function.

Supported Platforms:

Ascend GPU

Inputs:
  • a (Tensor) - The input tensor. With float32 data type. a should have the same dtype with x.

  • x (Tensor) - The input tensor. With float32 data type. x should have the same dtype with a.

Outputs:

Tensor, has the same dtype as a and x.

Examples

>>> input_a = Tensor(np.array([2.0, 4.0, 6.0, 8.0]).astype(np.float32))
>>> input_x = Tensor(np.array([2.0, 3.0, 4.0, 5.0]).astype(np.float32))
>>> igamma = nn.IGamma()
>>> output = igamma(input_a, input_x)
>>> print (output)
[0.593994  0.35276785  0.21486944  0.13337152]
class tinyms.layers.LBeta[source]

This is semantically equal to lgamma(x) + lgamma(y) - lgamma(x + y).

The method is more accurate for arguments above 8. The reason for accuracy loss in the naive computation is catastrophic cancellation between the lgammas. This method avoids the numeric cancellation by explicitly decomposing lgamma into the Stirling approximation and an explicit log_gamma_correction, and cancelling the large terms from the Striling analytically.

Supported Platforms:

Ascend GPU

Inputs:
  • x (Tensor) - The input tensor. With float16 or float32 data type. x should have the same dtype with y.

  • y (Tensor) - The input tensor. With float16 or float32 data type. y should have the same dtype with x.

Outputs:

Tensor, has the same dtype as x and y.

Examples

>>> input_x = Tensor(np.array([2.0, 4.0, 6.0, 8.0]).astype(np.float32))
>>> input_y = Tensor(np.array([2.0, 3.0, 14.0, 15.0]).astype(np.float32))
>>> lbeta = nn.LBeta()
>>> output = lbeta(input_y, input_x)
>>> print(output)
[-1.7917596  -4.094345  -12.000229  -14.754799]
class tinyms.layers.MatMul(transpose_x1=False, transpose_x2=False)[source]

Multiplies matrix x1 by matrix x2.

  • If both x1 and x2 are 1-dimensional, the dot product is returned.

  • If the dimensions of x1 and x2 are all not greater than 2, the matrix-matrix product will be returned. Note if one of ‘x1’ and ‘x2’ is 1-dimensional, the argument will first be expanded to 2 dimension. After the matrix multiply, the expanded dimension will be removed.

  • If at least one of x1 and x2 is N-dimensional (N>2), the none-matrix dimensions(batch) of inputs will be broadcasted and must be broadcastable. Note if one of ‘x1’ and ‘x2’ is 1-dimensional, the argument will first be expanded to 2 dimension and then the none-matrix dimensions will be broadcasted. After the matrix multiply, the expanded dimension will be removed. For example, if x1 is a \((j \times 1 \times n \times m)\) tensor and x2 is a \((k \times m \times p)\) tensor, the output will be a \((j \times k \times n \times p)\) tensor.

Parameters
  • transpose_x1 (bool) – If true, a is transposed before multiplication. Default: False.

  • transpose_x2 (bool) – If true, b is transposed before multiplication. Default: False.

Inputs:
  • input_x1 (Tensor) - The first tensor to be multiplied.

  • input_x2 (Tensor) - The second tensor to be multiplied.

Outputs:

Tensor, the shape of the output tensor depends on the dimension of input tensors.

Supported Platforms:

Ascend GPU CPU

Examples

>>> net = nn.MatMul()
>>> input_x1 = Tensor(np.ones(shape=[3, 2, 3]), mindspore.float32)
>>> input_x2 = Tensor(np.ones(shape=[3, 4]), mindspore.float32)
>>> output = net(input_x1, input_x2)
>>> print(output.shape)
(3, 2, 4)
class tinyms.layers.Moments(axis=None, keep_dims=None)[source]

Calculates the mean and variance of x.

Parameters
  • axis (Union[int, tuple(int)]) – Calculates the mean and variance along the specified axis. Default: ().

  • keep_dims (bool) – If true, The dimension of mean and variance are identical with input’s. If false, don’t keep these dimensions. Default: False.

Inputs:
  • input_x (Tensor) - The tensor to be calculated. Only float16 and float32 are supported.

Outputs:
  • mean (Tensor) - The mean of input x, with the same date type as input x.

  • variance (Tensor) - The variance of input x, with the same date type as input x.

Supported Platforms:

Ascend

Examples

>>> net = nn.Moments(axis=3, keep_dims=True)
>>> input_x = Tensor(np.array([[[[1, 2, 3, 4], [3, 4, 5, 6]]]]), mindspore.float32)
>>> output = net(input_x)
>>> print(output)
(Tensor(shape=[1, 1, 2, 1], dtype=Float32, value=
[[[[ 2.50000000e+00],
   [ 4.50000000e+00]]]]), Tensor(shape=[1, 1, 2, 1], dtype=Float32, value=
[[[[ 1.25000000e+00],
   [ 1.25000000e+00]]]]))
class tinyms.layers.MatInverse[source]

Calculates the inverse of Positive-Definite Hermitian matrix using Cholesky decomposition.

Supported Platforms:

GPU

Inputs:
  • a (Tensor[Number]) - The input tensor. It must be a positive-definite matrix. With float16 or float32 data type.

Outputs:

Tensor, has the same dtype as the a.

Examples

>>> input_a = Tensor(np.array([[4, 12, -16], [12, 37, -43], [-16, -43, 98]]).astype(np.float32))
>>> op = nn.MatInverse()
>>> output = op(input_a)
>>> print(output)
[[49.36112  -13.555558  2.1111116]
 [-13.555558  3.7777784  -0.5555557]
 [2.1111116  -0.5555557  0.11111111]]
class tinyms.layers.MatDet[source]

Calculates the determinant of Positive-Definite Hermitian matrix using Cholesky decomposition.

Supported Platforms:

GPU

Inputs:
  • a (Tensor[Number]) - The input tensor. It must be a positive-definite matrix. With float16 or float32 data type.

Outputs:

Tensor, has the same dtype as the a.

Examples

>>> input_a = Tensor(np.array([[4, 12, -16], [12, 37, -43], [-16, -43, 98]]).astype(np.float32))
>>> op = nn.MatDet()
>>> output = op(input_a)
>>> print(output)
35.999996
class tinyms.layers.Conv2dBnAct(in_channels, out_channels, kernel_size, stride=1, pad_mode='same', padding=0, dilation=1, group=1, has_bias=False, weight_init='normal', bias_init='zeros', has_bn=False, momentum=0.997, eps=1e-05, activation=None, alpha=0.2, after_fake=True)[source]

A combination of convolution, Batchnorm, and activation layer.

This part is a more detailed overview of Conv2d operation.

Parameters
  • in_channels (int) – The number of input channel \(C_{in}\).

  • out_channels (int) – The number of output channel \(C_{out}\).

  • kernel_size (Union[int, tuple]) – The data type is int or a tuple of 2 integers. Specifies the height and width of the 2D convolution window. Single int means the value is for both height and width of the kernel. A tuple of 2 ints means the first value is for the height and the other is for the width of the kernel.

  • stride (int) – Specifies stride for all spatial dimensions with the same value. The value of stride must be greater than or equal to 1 and lower than any one of the height and width of the input. Default: 1.

  • pad_mode (str) – Specifies padding mode. The optional values are “same”, “valid”, “pad”. Default: “same”.

  • padding (int) – Implicit paddings on both sides of the input. Default: 0.

  • dilation (int) – Specifies the dilation rate to use for dilated convolution. If set to be \(k > 1\), there will be \(k - 1\) pixels skipped for each sampling location. Its value must be greater than or equal to 1 and lower than any one of the height and width of the input. Default: 1.

  • group (int) – Splits filter into groups, in_ channels and out_channels must be divisible by the number of groups. Default: 1.

  • has_bias (bool) – Specifies whether the layer uses a bias vector. Default: False.

  • weight_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the convolution kernel. It can be a Tensor, a string, an Initializer or a number. When a string is specified, values from ‘TruncatedNormal’, ‘Normal’, ‘Uniform’, ‘HeUniform’ and ‘XavierUniform’ distributions as well as constant ‘One’ and ‘Zero’ distributions are possible. Alias ‘xavier_uniform’, ‘he_uniform’, ‘ones’ and ‘zeros’ are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of Initializer for more details. Default: ‘normal’.

  • bias_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the bias vector. Possible Initializer and string are the same as ‘weight_init’. Refer to the values of Initializer for more details. Default: ‘zeros’.

  • has_bn (bool) – Specifies to used batchnorm or not. Default: False.

  • momentum (float) – Momentum for moving average for batchnorm, must be [0, 1]. Default:0.9

  • eps (float) – Term added to the denominator to improve numerical stability for batchnorm, should be greater than 0. Default: 1e-5.

  • activation (Union[str, Cell, Primitive]) – Specifies activation type. The optional values are as following: ‘softmax’, ‘logsoftmax’, ‘relu’, ‘relu6’, ‘tanh’, ‘gelu’, ‘sigmoid’, ‘prelu’, ‘leakyrelu’, ‘hswish’, ‘hsigmoid’. Default: None.

  • alpha (float) – Slope of the activation function at x < 0 for LeakyReLU. Default: 0.2.

  • after_fake (bool) – Determine whether there must be a fake quantization operation after Cond2dBnAct.

Inputs:
  • input (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, C_{out}, H_{out}, W_{out})\).

Supported Platforms:

Ascend GPU

Examples

>>> net = nn.Conv2dBnAct(120, 240, 4, has_bn=True, activation='relu')
>>> input = Tensor(np.ones([1, 120, 1024, 640]), mindspore.float32)
>>> result = net(input)
>>> output = result.shape
>>> print(output)
(1, 240, 1024, 640)
class tinyms.layers.DenseBnAct(in_channels, out_channels, weight_init='normal', bias_init='zeros', has_bias=True, has_bn=False, momentum=0.9, eps=1e-05, activation=None, alpha=0.2, after_fake=True)[source]

A combination of Dense, Batchnorm, and the activation layer.

This part is a more detailed overview of Dense op.

Parameters
  • in_channels (int) – The number of channels in the input space.

  • out_channels (int) – The number of channels in the output space.

  • weight_init (Union[Tensor, str, Initializer, numbers.Number]) – The trainable weight_init parameter. The dtype is same as input. The values of str refer to the function initializer. Default: ‘normal’.

  • bias_init (Union[Tensor, str, Initializer, numbers.Number]) – The trainable bias_init parameter. The dtype is same as input. The values of str refer to the function initializer. Default: ‘zeros’.

  • has_bias (bool) – Specifies whether the layer uses a bias vector. Default: True.

  • has_bn (bool) – Specifies to use batchnorm or not. Default: False.

  • momentum (float) – Momentum for moving average for batchnorm, must be [0, 1]. Default:0.9

  • eps (float) – Term added to the denominator to improve numerical stability for batchnorm, should be greater than 0. Default: 1e-5.

  • activation (Union[str, Cell, Primitive]) – Specifies activation type. The optional values are as following: ‘softmax’, ‘logsoftmax’, ‘relu’, ‘relu6’, ‘tanh’, ‘gelu’, ‘sigmoid’, ‘prelu’, ‘leakyrelu’, ‘hswish’, ‘hsigmoid’. Default: None.

  • alpha (float) – Slope of the activation function at x < 0 for LeakyReLU. Default: 0.2.

  • after_fake (bool) – Determine whether there must be a fake quantization operation after DenseBnAct.

Inputs:
  • input (Tensor) - Tensor of shape \((N, in\_channels)\).

Outputs:

Tensor of shape \((N, out\_channels)\).

Supported Platforms:

Ascend

Examples

>>> net = nn.DenseBnAct(3, 4)
>>> input = Tensor(np.random.randint(0, 255, [2, 3]), mindspore.float32)
>>> result = net(input)
>>> output = result.shape
>>> print(output)
(2, 4)