miprometheus.models¶

Model¶

class miprometheus.models.Model(params, problem_default_values_={})[source]¶

Class representing base class for all Models.

Inherits from torch.nn.Module as all subclasses will represent a trainable model.

Hence, all subclasses should override the forward function.

Implements features & attributes used by all subclasses.

__init__(params, problem_default_values_={})[source]¶

Initializes a Model object.

Parameters:	params (`miprometheus.utils.ParamInterface`) – Parameters read from configuration file. problem_default_values (dict) – dict of parameters values coming from the problem class. One example of such parameter value is the size of the vocabulary set in a translation problem.

This constructor:

stores a pointer to params:
```
>>> self.params = params
```
sets a default problem name:
```
>>> self.name = 'Model'
```

initializes the logger.

>>> self.logger = logging.Logger(self.name)

tries to parse the values coming from problem_default_values_:

>>>         try:
>>>             for key in problem_default_values_.keys():
>>>                 self.params.add_custom_params({key: problem_default_values_[key]})
>>>         except BaseException:
>>>             self.logger.info('No parameter value was parsed from problem_default_values_')

initializes the data definitions:

Note

This dict contains information about the expected inputs and produced outputs of the current model class.

This object will be used during handshaking between the model and the problem class to ensure that the model can accept the batches produced by the problem and that the problem can accept the predictions of the model to compute the loss and accuracy.

This dict should be defined using self.params.

This dict should at least contains the targets field:

>>>     self.data_definitions = {'inputs': {'size': [-1, -1], 'type': [torch.Tensor]},
>>>                              'targets': {'size': [-1, 1], 'type': [torch.Tensor]}
>>>                             }

sets the access to AppState: for dtype, visualization flag etc.
```
>>> self.app_state = AppState()
```
initialize the best model loss (to select which model to save) to np.inf:
```
>>> self.best_loss = np.inf
```

handshake_definitions(problem_data_definitions_)[source]¶

Proceeds to the handshake between what the Problem class provides (through a DataDict) and what the model expects as inputs.

Note

Handshaking is defined here as making sure that the Model and the Problem agree on the data that they exchange. More specifically, the Model has a definition of the inputs data that it expects (through its self.data_definitions attribute). The Problem has the same object describing what it generates.

This functions proceeds to the handshaking as:

Verifying that all keys existing in Model.data_definitions are also existing in Problem.data_definitions. If a key is missing, an exception is thrown.

This function does not verify the key targets as this will be done by problems.problem.Problem.handshake_definitions.

If all keys are present, than this function checks that for each (Model.data_definitions) key, the shape and type of the corresponding value matches what is indicated for the corresponding key in Problem.data_definitions. If not, an exception is thrown.

If both steps above passed, than the Model accepts what the Problem generates and can proceed to the forward pass.

To properly define the data_definitions dicts, here are some examples:

>>> data_definitions = {'img': {'size': [-1, 320, 480, 3], 'type': [np.ndarray]},
>>>                     'question': {'size': [-1, -1], 'type': [torch.Tensor]},
>>>                     'question_length': {'size': [-1], 'type': [list, int]},
>>>                     # ...
>>>                     }
Please indicate both the size and the type as lists:

Indicate all dimensions in the correct order for each key size field. If a dimension is unimportant or unknown (e.g. the batch size or variable-length sequences), then please indicate -1 at the correct location.

If an object is a composition of several Python objects (list, dict,…), then please include all objects type, matching the dimensions order: e.g. [list, dict].

Parameters:	problem_data_definitions (dict) – Contains the definition of a sample generated by the `Problem` class.
Returns:	True if the `Model` accepts what the `Problem` generates, otherwise throws an exception.

add_statistics(stat_col)[source]¶

Adds statistics to StatisticsCollector.

Note

Empty - To be redefined in inheriting classes.

Parameters:	stat_col – `StatisticsCollector`.

collect_statistics(stat_col, data_dict, logits)[source]¶

Base statistics collection.

Note

Empty - To be redefined in inheriting classes. The user has to ensure that the corresponding entry in the StatisticsCollector has been created with self.add_statistics() beforehand.

Parameters:	stat_col – `StatisticsCollector`. data_dict (DataDict) – `DataDict` containing inputs and targets. logits – Predictions being output of the model.

add_aggregators(stat_agg)[source]¶

Adds statistical aggregators to :py:class:miprometheus.utils.StatisticsAggregator.

Note

Empty - To be redefined in inheriting classes.

Parameters:	stat_agg – :py:class:miprometheus.utils.StatisticsAggregator.

aggregate_statistics(stat_col, stat_agg)[source]¶

Aggregates the statistics collected by :py:class:miprometheus.utils.StatisticsCollector`` and adds the results to :py:class:miprometheus.utils.StatisticsAggregator.

Note

Empty - To be redefined in inheriting classes. The user has to ensure that the corresponding entry in the StatisticsAggregator has been created with self.add_aggregators() beforehand. Given that the StatisticsAggregator uses the statistics collected by the StatisticsCollector, the user should also ensure that these statistics are correctly collected (i.e. use of self.add_statistics and self.collect_statistics).

Parameters:	stat_col – :py:class:miprometheus.utils.StatisticsAggregatorCollector stat_agg – :py:class:miprometheus.utils.StatisticsAggregator

plot(data_dict, predictions, sample=0)[source]¶

Plots inputs, targets and predictions, along with model-dependent variables.

. note:

Abstract - to be defined in derived classes.

Parameters:	data_dict (`DataDict`) – `DataDict` containing input and target batches. predictions (`torch.tensor`) – Prediction. sample (int) – Number of sample in batch (default: 0)

save(model_dir, training_status, training_stats, validation_stats)[source]¶

Generic method saving the model parameters to file. It can be overloaded if one needs more control.

Parameters:

model_dir (str) – Directory where the model will be saved.
training_status (str) – String representing the current status of training.
training_stats (miprometheus.utils.StatisticsCollector or miprometheus.utils.StatisticsAggregator) – Training statistics that will be saved to checkpoint along with the model.
validation_stats (miprometheus.utils.StatisticsCollector or miprometheus.utils.StatisticsAggregator) – Validation statistics that will be saved to checkpoint along with the model.

Returns:

True if this is currently the best model (until the current episode, considering the loss).

load(checkpoint_file)[source]¶

Loads a model from the specified checkpoint file.

Parameters:	checkpoint_file – File containing dictionary with model state and statistics.

summarize()[source]¶

Summarizes the model by showing the trainable/non-trainable parameters and weights per layer ( nn.Module ).

Uses recursive_summarize to iterate through the nested structure of the model (e.g. for RNNs).

Returns:	Summary as a str.

recursive_summarize(module_, indent_, module_name_)[source]¶

Function that recursively inspects the (sub)modules and records their statistics (like names, types, parameters, their numbers etc.)

Parameters:	module (`nn.Module` or subclass) – Module to be inspected. indent (int) – Current indentation level. module_name (str) – Name of the module that will be displayed before its type.
Returns:	Str summarizing the module.

SequentialModel¶

class miprometheus.models.SequentialModel(params, problem_default_values_={})[source]¶

Class representing base class for all Sequential Models.

Inherits from models.model.Model as most features are the same.

Should be derived by all sequential models.

__init__(params, problem_default_values_={})[source]¶

Mostly calls the base models.model.Model constructor.

Specifies a better structure for self.data_definitions.

Parameters:	params – Parameters read from configuration `.yaml` file. problem_default_values (dict) – dict of parameters values coming from the problem class. One example of such parameter value is the size of the vocabulary set in a translation problem.

plot(data_dict, predictions, sample=0)[source]¶

Creates a default interactive visualization, with a slider enabling to move forth and back along the time axis (iteration over the sequence elements in a given episode). The default visualization contains the input, output and target sequences.

For a more model/problem - dependent visualization, please overwrite this method in the derived model class.

Parameters:	data_dict – DataDict containing input sequences: [BATCH_SIZE x SEQUENCE_LENGTH x INPUT_SIZE], target sequences: [BATCH_SIZE x SEQUENCE_LENGTH x OUTPUT_SIZE] predictions (torch.tensor) – Predicted sequences [BATCH_SIZE x SEQUENCE_LENGTH x OUTPUT_SIZE] sample (int) – Number of sample in batch (default: 0)

ModelFactory¶

class miprometheus.models.ModelFactory[source]¶

ModelFactory: Class instantiating the specified model class using the passed params.

static build(params, problem_default_values_={})[source]¶

Static method returning a particular model, depending on the name provided in the list of parameters.

Parameters:	params (`utils.param_interface.ParamInterface`) – Parameters used to instantiate the model class.

..note:

``params`` should contains the exact (case-sensitive) class name of the model to instantiate.

Parameters:	problem_default_values (dict) – Default (hardcoded) values coming from a Problem class. Can be used to pass values such as a number of classes, an embedding dimension etc.
Returns:	Instance of a given model.

Visual Question Answering baselines¶

CNN + LSTM¶

class miprometheus.models.vqa_baselines.cnn_lstm.CNN_LSTM(params, problem_default_values_={})[source]¶

Implementation of a simple VQA baseline, globally following these steps:

Image Encoding, using a CNN model,

Question Encoding (if specified) using a LSTM,

Concatenates the two features vectors and pass then through a MLP to produce the predictions.

Warning

The CNN model used in this implementation is the one from the Relational Network model (implementation in models.relational_net.conv_input_model.py), constituted of 4 convolutional layers (with batch normalization).

Altough the cited paper above mentions GoogLeNet & VGG as other CNN models, they are not supported for now. It is planned in a future release to add support for torchvision models.

This implementation has only been tested on SortOfCLEVR for now.

__init__(params, problem_default_values_={})[source]¶

Constructor of the CNN_LSTM model.

Parses the parameters, instantiates the LSTM & CNN model, alongside with the MLP classifier.

Parameters:	params (utils.ParamInterface) – dict of parameters (read from configuration `.yaml` file). problem_default_values (dict) – default values coming from the `Problem` class.

forward(data_dict)[source]¶

Runs the CNN_LSTM model.

Parameters:

data_dict (utils.DataDict) –

DataDict({‘images’, ‘questions’, …}) where:

images: [batch_size, num_channels, height, width],
questions: [batch_size, size_question_encoding]

Returns: Predictions: [batch_size, output_classes]

plot(data_dict, predictions, sample=0)[source]¶

Displays the image, the predicted & ground truth answers.

Parameters:	data_dict (utils.DataDict) – DataDict({‘images’, ‘questions’, ‘targets’}) where: images: [batch_size, num_channels, height, width] questions: [batch_size, size_question_encoding] targets: [batch_size] predictions (torch.tensor) – Prediction. sample (int) – Index of sample in batch (DEFAULT: 0).

Stacked Attention Networks¶

class miprometheus.models.vqa_baselines.stacked_attention_networks.StackedAttentionNetwork(params, problem_default_values_)[source]¶

Implementation of a Stacked Attention Networks (SAN).

The three major components of SAN are:

the image model (CNN model, possibly pretrained),

the question model (LSTM based),

the stacked attention model.

Warning

This implementation has only been tested on SortOfCLEVR so far.

__init__(params, problem_default_values_)[source]¶

Constructor class of StackedAttentionNetwork model.

Parses the parameters,
Instantiates the CNN model: A simple, 4-layers one, or a pretrained one,
Instantiates an LSTM for the questions encoding,
Instantiates a 3-layers MLP as classifier.

Parameters:	params (utils.ParamInterface) – dict of parameters (read from configuration `.yaml` file). problem_default_values (dict) – default values coming from the `Problem` class.

forward(data_dict)[source]¶

Runs the StackedAttentionNetwork model.

Parameters:

data_dict (utils.DataDict) –

DataDict({‘images’, ‘questions’, …}) where:

images: [batch_size, num_channels, height, width],
questions: [batch_size, size_question_encoding]

Returns: Predictions: [batch_size, output_classes]

plot(data_dict, predictions, sample=0)[source]¶

Displays the image, the predicted & ground truth answers.

Parameters:	data_dict (utils.DataDict) – DataDict({‘images’, ‘questions’, ‘targets’}) where: images: [batch_size, num_channels, height, width], questions: [batch_size, size_question_encoding] targets: [batch_size] predictions (torch.tensor) – Prediction. sample (int) – Index of sample in batch (DEFAULT: 0).

class miprometheus.models.vqa_baselines.stacked_attention_networks.StackedAttentionLayer(question_image_encoding_size, key_query_size, num_att_layers=2)[source]¶

Stacks several layers of Attention to enable multi-step reasoning.

__init__(question_image_encoding_size, key_query_size, num_att_layers=2)[source]¶

Constructor of the StackedAttentionLayers class.

Parameters:	question_image_encoding_size (int) – Size of the images & questions encoding. key_query_size (int) – Size of the Key & Query, considered the same for both in this implementation. num_att_layers (int) – Number of `AttentionLayer` to use.

forward(encoded_image, encoded_question)[source]¶

Apply stacked attention.

Parameters:	encoded_image (torch.tensor) – output of the image encoding (CNN + FC layer), should be of shape [batch_size, width * height, num_channels_encoded_image] encoded_question (torch.tensor) – Last hidden layer of the LSTM, of shape [batch_size, question_encoding_size]
Returns:	u: attention [batch_size, num_channels_encoded_image]

class miprometheus.models.vqa_baselines.stacked_attention_networks.AttentionLayer(question_image_encoding_size, key_query_size=512)[source]¶

Implements one layer of the Stacked Attention mechanism.

Reference: Section 3.3 of the paper cited above.

__init__(question_image_encoding_size, key_query_size=512)[source]¶

Constructor of the AttentionLayer class.

Parameters:	question_image_encoding_size (int) – Size of the images & questions encoding. key_query_size (int) – Size of the Key & Query, considered the same for both in this implementation.

forward(encoded_image, encoded_question)[source]¶

Applies one layer of stacked attention over the image & question.

Parameters:

encoded_image (torch.tensor) – output of the image encoding (CNN + FC layer), should be of shape [batch_size, width * height, num_channels_encoded_image]
encoded_question (torch.tensor) – Last hidden layer of the LSTM, of shape [batch_size, question_encoding_size]

Returns:

“Refined query vector” (weighted sum of the image vectors, combine with the question vector), of shape [batch_size, num_channels_encoded_image]
Attention weights, todo: shape?

class miprometheus.models.vqa_baselines.stacked_attention_networks.PretrainedImageEncoding(cnn_model='resnet18', num_layers=2)[source]¶

Wrapper class over a torchvision.model to produce feature maps for the SAN model.

__init__(cnn_model='resnet18', num_layers=2)[source]¶

Constructor of the PretrainedImageEncoding class.

Parameters:	cnn_model (str) – select which pretrained model to load.

Warning

This class has only been tested with the resnet18 model.

Parameters:	num_layers (int) – Number of layers to select from the `cnn_model`.

get_output_nb_filters()[source]¶

Returns:	The number of filters of the last conv layer.

forward(img)[source]¶

Forward pass of a pretrained cnn model.

Parameters:	img (torch.tensor) – input image [batch_size, num_channels, height, width]
Return x:	feature maps, [batch_size, output_channels, new_height, new_width]

class miprometheus.models.vqa_baselines.stacked_attention_networks.MultiHopsStackedAttentionNetwork(params, problem_default_values_)[source]¶

Implementation of a Stacked Attention Networks (SAN), with several attention hops over the question words.

The implementation details are very similar to the StackedAttentionNetwork`, to the difference that it uses an LSTMCell instead of an LSTM.

Warning

This implementation has only been tested on ShapeColorQuery so far.

__init__(params, problem_default_values_)[source]¶

Constructor class of MultiHopsStackedAttentionNetwork model.

Parses the parameters,
Instantiates the CNN model: A simple, 4-layers one, or a pretrained one,
Instantiates an LSTMCell for the questions encoding,
Instantiates a 3-layers MLP as classifier.

Parameters:	params (utils.ParamInterface) – dict of parameters (read from configuration `.yaml` file). problem_default_values (dict) – default values coming from the `Problem` class.

init_hidden_states(batch_size)[source]¶

Initialize the hidden and cell states of the LSTM to 0.

Parameters:	batch_size (int) – Size of the batch.
Returns:	hx, cx: hidden and cell states initialized to 0.

forward(data_dict)[source]¶

Runs the MultiHopsStackedAttentionNetwork model.

Parameters:

data_dict (utils.DataDict) –

DataDict({‘images’, ‘questions’, …}) where:

images: [batch_size, num_channels, height, width],
questions: [batch_size, size_question_encoding]

Returns: Predictions: [batch_size, output_classes]

plot(data_dict, predictions, sample=0)[source]¶

Displays the image, the predicted & ground truth answers.

Parameters:	data_dict (utils.DataDict) – DataDict({‘images’, ‘questions’, ‘targets’}) where: images: [batch_size, num_channels, height, width], questions: [batch_size, size_question_encoding] targets: [batch_size] predictions (torch.tensor) – Prediction. sample (int) – Index of sample in batch (DEFAULT: 0).

MAC¶

class miprometheus.models.mac.ControlUnit(dim, max_step)[source]¶

Implementation of the ControlUnit of the MAC network.

__init__(dim, max_step)[source]¶

Constructor for the control unit.

Parameters:	dim (int) – global ‘d’ hidden dimension max_step (int) – maximum number of steps -> number of MAC cells in the network.

forward(step, contextual_words, question_encoding, ctrl_state)[source]¶

Forward pass of the ControlUnit.

Parameters:

step (int) – index of the current MAC cell.
contextual_words (torch.tensor) – tensor of shape [batch_size x maxQuestionLength x dim] containing the words encodings (‘representation of each word in the context of the question’).
question_encoding (torch.tensor) – question representation, of shape [batch_size x 2*dim].
ctrl_state (torch.tensor) – previous control state, of shape [batch_size x dim]

Returns:

new control state, [batch_size x dim]

class miprometheus.models.mac.ImageProcessing(dim)[source]¶

Image encoding using a 2-layers CNN assuming the images have been already preprocessed by ResNet101.

__init__(dim)[source]¶

Constructor for the 2-layers CNN.

Parameters:	dim (int) – global ‘d’ hidden dimension

forward(feature_maps)[source]¶

Apply the constructed CNN model on the feature maps (coming from ResNet101).

Return feature_maps:
Parameters:	feature_maps (torch.tensor) – [batch_size x nb_kernels x feat_H x feat_W] coming from ResNet101. Should have [nb_kernels x feat_H x feat_W] = [1024 x 14 x 14].
	feature map, shape [batch_size, dim, new_height, new_width]

class miprometheus.models.mac.InputUnit(dim, embedded_dim)[source]¶

Implementation of the InputUnit of the MAC network.

__init__(dim, embedded_dim)[source]¶

Constructor for the InputUnit.

Parameters:	dim (int) – global ‘d’ hidden dimension embedded_dim (int) – dimension of the word embeddings.

forward(questions, questions_len, feature_maps)[source]¶

Forward pass of the InputUnit.

Parameters:

questions (torch.tensor) – tensor of the questions words, shape [batch_size x maxQuestionLength x embedded_dim].
questions_len (list) – Unpadded questions length.
feature_maps (torch.tensor) – [batch_size x nb_kernels x feat_H x feat_W] coming from ResNet101.

Returns:

question encodings: [batch_size x 2*dim] (torch.tensor),
word encodings: [batch_size x maxQuestionLength x dim] (torch.tensor),
images_encodings: [batch_size x nb_kernels x (H*W)] (torch.tensor).

class miprometheus.models.mac.MACUnit(dim, max_step=12, self_attention=False, memory_gate=False, dropout=0.15)[source]¶

Implementation of the MACUnit (iteration over the MAC cell) of the MAC network.

__init__(dim, max_step=12, self_attention=False, memory_gate=False, dropout=0.15)[source]¶

Constructor for the MACUnit, which represents the recurrence over the MACCell.

Parameters:

dim (int) – global ‘d’ hidden dimension.
max_step (int) – maximal number of MAC cells. Default: 12
self_attention (bool) – whether or not to use self-attention in the WriteUnit. Default: False.
memory_gate (bool) – whether or not to use memory gating in the WriteUnit. Default: False.
dropout (float) – dropout probability for the variational dropout mask. Default: 0.15

get_dropout_mask(x, dropout)[source]¶

Create a dropout mask to be applied on x.

Parameters:	x (torch.tensor) – tensor of arbitrary shape to apply the mask on. dropout (float) – dropout rate.
Returns:	mask.

forward(context, question, knowledge, kb_proj)[source]¶

Forward pass of the MACUnit, which represents the recurrence over the MACCell.

Parameters:	context (torch.tensor) – contextual words, shape [batch_size x maxQuestionLength x dim] question (torch.tensor) – questions encodings, shape [batch_size x 2dim] knowledge* (torch.tensor) – knowledge_base (feature maps extracted by a CNN), shape [batch_size x nb_kernels x (feat_H * feat_W)].
Returns:	list of the memory states.

class miprometheus.models.mac.MACNetwork(params, problem_default_values_={})[source]¶

Implementation of the entire MAC network.

__init__(params, problem_default_values_={})[source]¶

Constructor for the MAC network.

Parameters:	params (utils.ParamInterface) – dict of parameters (read from configuration `.yaml` file). problem_default_values (dict) – default values coming from the `Problem` class.

forward(data_dict, dropout=0.15)[source]¶

Forward pass of the MAC network. Calls first the InputUnit, then the recurrent MAC cells and finally the `OutputUnit.

Parameters:	data_dict (utils.DataDict) – input data batch. dropout (float) – dropout rate.
Returns:	Predictions of the model.

static generate_figure_layout()[source]¶

Generate a figure layout for the attention visualization (done in MACNetwork.plot())

Returns:	figure layout.

plot(data_dict, logits, sample=0)[source]¶

Visualize the attention weights (ControlUnit & ReadUnit) on the question & feature maps. Dynamic visualization throughout the reasoning steps is possible.

Parameters:	data_dict (utils.DataDict) – DataDict({‘images’,’questions’, ‘questions_length’, ‘questions_string’, ‘questions_type’, ‘targets’, ‘targets_string’, ‘index’,’imgfiles’, ‘prediction_string’}) logits (torch.tensor) – Prediction of the model. sample (int) – Index of sample in batch (Default: 0)
Returns:	True when the user closes the window, False if we do not need to visualize.

class miprometheus.models.mac.OutputUnit(dim, nb_classes)[source]¶

Implementation of the OutputUnit of the MAC network.

__init__(dim, nb_classes)[source]¶

Constructor for the OutputUnit.

Parameters:	dim (int) – global ‘d’ dimension. nb_classes (int) – number of classes to consider (classification problem).

forward(mem_state, question_encodings)[source]¶

Forward pass of the OutputUnit.

Parameters:	mem_state (torch.tensor) – final memory state, shape [batch_size x dim] question_encodings (torch.tensor) – questions encodings, shape [batch_size x (2*dim)]
Returns:	probability distribution over the classes, [batch_size x nb_classes]

class miprometheus.models.mac.ReadUnit(dim)[source]¶

Implementation of the ReadUnit of the MAC network.

__init__(dim)[source]¶

Constructor for the ReadUnit.

Parameters:	dim (int) – global ‘d’ hidden dimension

forward(memory_states, knowledge_base, ctrl_states, kb_proj)[source]¶

Forward pass of the ReadUnit. Assuming 1 scalar attention weight per knowledge base elements.

Parameters:	memory_states (torch.tensor) – list of all previous memory states, each of shape [batch_size x mem_dim] knowledge_base (torch.tensor) – image representation (output of CNN), shape [batch_size x nb_kernels x (feat_H * feat_W)] ctrl_states (list) – All previous control state, each of shape [batch_size x ctrl_dim].
Returns:	current read vector, shape [batch_size x read_dim]

miprometheus.models.mac.linear(input_dim, output_dim, bias=True)[source]¶

Defines a Linear layer. Specifies Xavier as the initialization type of the weights, to respect the original implementation: https://github.com/stanfordnlp/mac-network/blob/master/ops.py#L20

Parameters:	input_dim (int) – input dimension output_dim (int) – output dimension bias (bool) – If set to True, the layer will learn an additive bias initially set to true (as original implementation https://github.com/stanfordnlp/mac-network/blob/master/ops.py#L40)
Returns:	Initialized Linear layer

class miprometheus.models.mac.WriteUnit(dim, self_attention=False, memory_gate=False)[source]¶

Implementation of the WriteUnit of the MAC network.

__init__(dim, self_attention=False, memory_gate=False)[source]¶

Constructor for the WriteUnit.

Parameters:	dim (int) – global ‘d’ hidden dimension self_attention (bool) – whether or not to use self-attention on the previous control states memory_gate (bool) – whether or not to use memory gating.

forward(memory_states, read_vector, ctrl_states)[source]¶

Forward pass of the WriteUnit.

Parameters:	memory_states (list) – All previous memory states, each of shape [batch_size x dim]. read_vector (torch.tensor) – current read vector (output of the read unit), shape [batch_size x dim]. ctrl_states (list) – All previous control states, each of shape [batch_size x dim].
Returns:	current memory state, shape [batch_size x mem_dim]

Simplified MAC¶

class miprometheus.models.s_mac.ControlUnit(dim, max_step)[source]¶

Implementation of the ControlUnit for the S-MAC model.

Note

This implementation is part of a simplified version of the MAC network, where modifications regarding the different units have been done to reduce the number of linear layers (and thus number of parameters).

This is part of a submission to the ViGIL workshop for NIPS 2018. Feel free to use this model and refer to it with the following BibTex:

@article{marois2018transfer,
        title={On transfer learning using a MAC model variant},
        author={Marois, Vincent and Jayram, TS and Albouy, Vincent and Kornuta, Tomasz and Bouhadjar, Younes and Ozcan, Ahmet S},
        journal={arXiv preprint arXiv:1811.06529},
        year={2018}
}

__init__(dim, max_step)[source]¶

Constructor for the ControlUnit.

Parameters:	dim (int) – global ‘d’ hidden dimension. max_step (int) – maximum number of steps -> number of MAC cells in the network.

forward(step, contextual_words, question_encoding, ctrl_state)[source]¶

Forward pass of the ControlUnit for the S-MAC network.

Parameters:

step (int) – index of the current MAC cell.
contextual_words (torch.Tensor) – tensor of shape [batch_size x maxQuestionLength x dim] containing the words encodings (“representation of each word in the context of the question”).
question_encoding (torch.Tensor) – question representation, of shape [batch_size x 2*dim].
ctrl_state (torch.Tensor) – previous control state, of shape [batch_size x dim]

Returns:

new control state, [batch_size x dim] (torch.Tensor)

class miprometheus.models.s_mac.MACUnit(dim, max_step=12, dropout=0.15)[source]¶

Implementation of the MACUnit (iteration over the MAC cell) of the S-MAC network.

Note

This implementation is part of a simplified version of the MAC network, where modifications regarding the different units have been done to reduce the number of linear layers (and thus number of parameters).

The implementation being simplified, we are not using the optional self-attention & memory-gating in the WriteUnit.

This is part of a submission to the ViGIL workshop for NIPS 2018. Feel free to use this model and refer to it with the following BibTex:

@article{marois2018transfer,
        title={On transfer learning using a MAC model variant},
        author={Marois, Vincent and Jayram, TS and Albouy, Vincent and Kornuta, Tomasz and Bouhadjar, Younes and Ozcan, Ahmet S},
        journal={arXiv preprint arXiv:1811.06529},
        year={2018}
}

__init__(dim, max_step=12, dropout=0.15)[source]¶

Constructor for the MACUnit, which represents the recurrence over the MACCell for the S-MAC network.

Parameters:	dim (int) – global ‘d’ hidden dimension. max_step (int) – maximal number of MAC cells. Default: 12. dropout (float) – dropout probability for the variational dropout mask. Default: 0.15.

static get_dropout_mask(x, dropout)[source]¶

Create a dropout mask to be applied on x.

Parameters:	x (`torch.Tensor`) – tensor of arbitrary shape to apply the mask on. dropout (float) – dropout rate.
Returns:	mask (`torch.Tensor`)

forward(context, question, kb_proj)[source]¶

Forward pass of the MACUnit, which represents the recurrence over the MACCell for the S-MAC network.

Parameters:	context (`torch.Tensor`) – contextual words, shape [batch_size x maxQuestionLength x dim] question (`torch.Tensor`) – questions encodings, shape [batch_size x 2dim] kb_proj* (`torch.Tensor`) – Linear projection of the knowledge_base (feature maps extracted by a CNN), shape [batch_size x dim x (feat_H * feat_W)].
Returns:	Last memory state (`torch.Tensor`)

class miprometheus.models.s_mac.sMacNetwork(params, problem_default_values_={})[source]¶

Implementation of the entire S-MAC model.

Note

This implementation is a simplified version of the MAC network, where modifications regarding the different units have been done to reduce the number of linear layers (and thus number of parameters).

This is part of a submission to the ViGIL workshop for NIPS 2018. Feel free to use this model and refer to it with the following BibTex:

@article{marois2018transfer,
        title={On transfer learning using a MAC model variant},
        author={Marois, Vincent and Jayram, TS and Albouy, Vincent and Kornuta, Tomasz and Bouhadjar, Younes and Ozcan, Ahmet S},
        journal={arXiv preprint arXiv:1811.06529},
        year={2018}
}

__init__(params, problem_default_values_={})[source]¶

Constructor for the S-MAC network.

Parameters:	params (`miprometheus.utils.ParamInterface`) – dict of parameters (read from configuration `.yaml` file). problem_default_values (dict) – default values coming from the `Problem` class.

forward(data_dict, dropout=0.15)[source]¶

Forward pass of the S-MAC network.

Calls first the InputUnit, then the recurrent S-MAC cells and finally the OutputUnit`.

Parameters:	data_dict (`miprometheus.utils.DataDict`) – input data batch. dropout (float) – dropout rate.
Returns:	Predictions of the model.

static generate_figure_layout()[source]¶

Generate a figure layout for the attention visualization (done in sMacNetwork.plot())

Returns:	`matplotlib.figure.Figure` layout.

plot(data_dict, logits, sample=0)[source]¶

Visualize the attention weights (ControlUnit & ReadUnit) on the question & feature maps.

Dynamic visualization throughout the reasoning steps is possible.

Parameters:	data_dict (`miprometheus.utils.DataDict`) – DataDict({‘questions_string’, ‘questions_type’, ‘targets_string’,’imgfiles’, ‘prediction_string’, ‘clevr_dir’, ``}) logits (`torch.Tensor`) – Prediction of the model. sample** (int) – Index of sample in batch (Default: 0)
Returns:	True when the user closes the window, False if we do not need to visualize.

class miprometheus.models.s_mac.ReadUnit(dim)[source]¶

Implementation of the ReadUnit for the S-MAC model.

Note

This implementation is part of a simplified version of the MAC network, where modifications regarding the different units have been done to reduce the number of linear layers (and thus number of parameters).

This is part of a submission to the ViGIL workshop for NIPS 2018. Feel free to use this model and refer to it with the following BibTex:

@article{marois2018transfer,
        title={On transfer learning using a MAC model variant},
        author={Marois, Vincent and Jayram, TS and Albouy, Vincent and Kornuta, Tomasz and Bouhadjar, Younes and Ozcan, Ahmet S},
        journal={arXiv preprint arXiv:1811.06529},
        year={2018}
}

__init__(dim)[source]¶

Constructor for the ReadUnit of the S-MAC model.

Parameters:	dim (int) – global ‘d’ hidden dimension.

forward(memory_state, ctrl_state, kb_proj)[source]¶

Forward pass of the ReadUnit. Assuming 1 scalar attention weight per knowledge base elements.

Parameters:	memory_state (`torch.Tensor`) – Memory state, shape [batch_size x mem_dim]. ctrl_state (`torch.Tensor`) – Control state, shape [batch_size x ctrl_dim]. kb_proj (`torch.Tensor`) – Linear projection of the image representation (output of CNN), shape [batch_size x dim x (feat_H * feat_W)].
Returns:	current read vector, shape [batch_size x read_dim] (`torch.Tensor`)

class miprometheus.models.s_mac.WriteUnit(dim)[source]¶

Implementation of the WriteUnit for the S-MAC model.

Note

This implementation is part of a simplified version of the MAC network, where modifications regarding the different units have been done to reduce the number of linear layers (and thus number of parameters).

This is part of a submission to the ViGIL workshop for NIPS 2018. Feel free to use this model and refer to it with the following BibTex:

@article{marois2018transfer,
        title={On transfer learning using a MAC model variant},
        author={Marois, Vincent and Jayram, TS and Albouy, Vincent and Kornuta, Tomasz and Bouhadjar, Younes and Ozcan, Ahmet S},
        journal={arXiv preprint arXiv:1811.06529},
        year={2018}
}

__init__(dim)[source]¶

Constructor for the WriteUnit of the S-MAC model.

Parameters:	dim (int) – global ‘d’ hidden dimension.

forward(read_vector)[source]¶

Forward pass of the WriteUnit for the S-MAC model.

Parameters:	read_vector (`torch.Tensor`) – current read vector (output of the `ReadUnit`), shape [batch_size x dim].
Returns:	current memory state, shape [batch_size x mem_dim] (`torch.Tensor`).

Relational Networks¶

class miprometheus.models.relational_net.ConvInputModel[source]¶

Simple 4 layers CNN for image encoding in the RelationalNetwork model.

__init__()[source]¶

Constructor.

Defines the 4 convolutional layers and batch normalization layers.

This implementation is inspired from the description in the section ‘Supplementary Material - CLEVR from pixels’ in the reference paper (https://arxiv.org/pdf/1706.01427.pdf).

get_output_nb_filters()[source]¶

Returns:	The number of filters of the last conv layer.

get_output_shape(height, width)[source]¶

Getter method which computes the output height & width of the features maps.

Parameters:	height (int) – Input image height. width (int) – Input image width.
Returns:	height, width of the produced feature maps.

forward(img)[source]¶

Forward pass of the CNN. :param img: images to pass through the CNN layers. Should be of size [N, 3, 128, 128]. :type img: torch.tensor

Returns:	output of the CNN. Should be of size [N, 24, 8, 8].

class miprometheus.models.relational_net.PairwiseRelationNetwork(input_size)[source]¶

Implementation of the g_theta MLP used in the Relational Network model.

For recall, the role of g_theta is to infer the ways in which 2 regions of the CNN feature maps are related, or if they are even related at all.

__init__(input_size)[source]¶

Constructor for the f_phi MLP.

Instantiates 4 linear layers, having 256 nodes per layers.

param input_size:

input size.

type input_size:

int

forward(inputs)[source]¶

forward pass of the g_theta MLP.

Parameters:	inputs – tensor of shape [batch_size, -1, input_size], should represent the pairs of regions (in the CNN feature maps) cat with the question encoding.
Returns:	tensor of shape [batch_size, -1, 256].

class miprometheus.models.relational_net.SumOfPairsAnalysisNetwork(output_size)[source]¶

Implementation of the f_phi MLP used in the Relational Network model.

For recall, the role of f_phi is to produce the probability distribution over all possible answers.

__init__(output_size)[source]¶

Constructor for the f_phi MLP.

Instantiates 3 linear layers, having 256 nodes per layers.

Parameters:	output_size (int) – number of classes for the last layer.

forward(inputs)[source]¶

forward pass of the f_phi MLP.

Parameters:	inputs – tensor of shape [batch_size, -1, 256], should represent the element-wise sum of the outputs of g_theta.
Returns:	Predictions over the available classes, tensor of shape [batch_size, -1, output_size]

class miprometheus.models.relational_net.RelationalNetwork(params, problem_default_values_={})[source]¶

Implementation of the Relational Network (RN) model.

Questions are processed with an LSTM to produce a question embedding, and images are processed with a CNN to produce a set of objects for the RN. ‘Objects’ are constructed using feature-map vectors from the convolved image. The RN considers relations across all pairs of objects, conditioned on the question embedding, and integrates all these relations to answer the question.

Reference paper: https://arxiv.org/abs/1706.01427.

The CNN model used for the image encoding is located in conv_input_model.py.

The MLPs (g_theta & f_phi) are in functions.py.

__init__(params, problem_default_values_={})[source]¶

Constructor.

Instantiates the CNN model (4 layers), and the 2 Multi Layer Perceptrons.

Parameters:	params – dictionary of parameters (read from the `.yaml` configuration file.) problem_default_values (dict.) – default values coming from the `Problem` class.

build_coord_tensor(batch_size, d)[source]¶

Create the tensor containing the spatial relative coordinate of each region (1 pixel) in the feature maps of the ConvInputModel. These spatial relative coordinates are used to ‘tag’ the regions.

Parameters:	batch_size (int) – batch size d (int) – size of 1 feature map
Returns:	tensor of shape [batch_size x d x d x 2]

forward(data_dict)[source]¶

Runs the RelationalNetwork model.

Parameters:

data_dict (utils.DataDict) –

DataDict({‘images’, ‘questions’, …}) containing:

images [batch_size, num_channels, height, width],
questions [batch_size, question_size]

Returns: Predictions of the model [batch_size, nb_classes]

Image Classification models¶

class miprometheus.models.vision.AlexnetWrapper(params, problem_default_values_={})[source]¶

Wrapper class to Alexnet model from TorchVision.

__init__(params, problem_default_values_={})[source]¶

Constructor for the AlexNet wrapper. Simply instantiate the Alexnet model from torchvision.models.

Note

The model expects input images normalized as follows: mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225].

Parameters:	params – dictionary of parameters (read from the `.yaml` configuration file.) problem_default_values (dict) – default values coming from the `Problem` class.

forward(data_dict)[source]¶

Main forward pass of the Alexnet wrapper.

Parameters:

data_dict –

DataDict({‘images’,**}), where:

images: [batch_size, num_channels, width, height],

Returns: Predictions [batch_size, num_classes]

plot(data_dict, predictions, sample_number=0)[source]¶

Simple plot - shows the Problem’s images with the target & actual predicted class. :param data_dict: DataDict({‘images’,’targets’, ‘targets_label’}) :type data_dict: utils.DataDict

Parameters:	predictions (torch.tensor) – Predictions of the `AlexnetWrapper`. sample_number (int) – Index of the sample in batch (DEFAULT: 0).

class miprometheus.models.vision.LeNet5(params_, problem_default_values_)[source]¶

A classical LeNet-5 model for MNIST digits classification.

__init__(params_, problem_default_values_)[source]¶

Initializes the LeNet5 model, creates the required layers.

Parameters:	params (`miprometheus.utils.ParamInterface`) – Parameters read from configuration file. problem_default_values (dict) – dict of parameters values coming from the problem class.

forward(data_dict)[source]¶

Main forward pass of the LeNet5 model.

Parameters:

data_dict (miprometheus.utils.DataDict) –

DataDict({‘images’,**}), where:

images: [batch_size, num_channels, width, height]

Returns: Predictions [batch_size, num_classes]

class miprometheus.models.vision.SimpleConvNet(params, problem_default_values_={})[source]¶

A simple 2 layers CNN designed specifically to solve MNIST & CIFAR10 datasets. The parameters here are not hardcoded so the user can adjust them for his application, and see their impact on the model’s behavior.

__init__(params, problem_default_values_={})[source]¶

Constructor of the SimpleConvNet. The overall structure of this CNN is as follows:

Conv1 -> MaxPool1 -> ReLu -> Conv2 -> MaxPool2 -> ReLu (-> flatten) -> Linear1 -> Linear2 -> Linear3

The parameters that the user can change are:

For Conv1 & Conv2: number of output channels, kernel size, stride and padding.

For MaxPool1 & MaxPool2: Kernel size

For Linear3: The number of classes is read from problem_default_values_. The number of output nodes for Linear1 is set to 120, and Linear2 is fixed to 120 -> 84 for now. Linear3 is 84 -> nb_classes.

Note

We are using the default values of dilatation, groups & bias for nn.Conv2D.

Similarly for the stride, padding, dilatation, return_indices & ceil_mode of nn.MaxPool2D.

The size of the images (width, height, number of channels) are read from problem_default_values_. Also, it is possible that the images are padded (with 0s) by the Problem class. The padding values (e.g. [2,2,2,2]) should be indicated in problem_default_values_, so that we can adjust the width & height.

Note

The images will be upscaled to [224, 224] (which is the input size of AlexNet, so this would allow for comparison) if problem_default_values_['up_scaling'] is True.

Parameters:	params (utils.ParamInterface) – dict of parameters (read from configuration `.yaml` file). problem_default_values (dict) – default values coming from the `Problem` class.

forward(data_dict)[source]¶

forward pass of the SimpleConvNet model.

Parameters:

data_dict –

DataDict({‘images’,’targets’, ‘targets_label’}), where:

images: [batch_size, num_channels, width, height],
targets [batch_size]

Returns: Predictions [batch_size, num_classes]

plot(data_dict, predictions, sample_number=0)[source]¶

Simple plot - shows the Problem’s images with the target & actual predicted class. :param data_dict: DataDict({‘images’,’targets’, ‘targets_label’}) :type data_dict: utils.DataDict

Parameters:	predictions (torch.tensor) – Predictions of the `SimpleConvNet`. sample_number (int) – Index of the sample in batch (DEFAULT: 0).

Controllers for MANNs models¶

class miprometheus.models.controllers.ControllerFactory[source]¶

Class returning concrete controller depending on the name provided in the list of parameters.

static build(params)[source]¶

Static method returning particular controller, depending on the name provided in the list of parameters.

Parameters:	params (`utils.param_interface.ParamInterface`) – Parameters used to instantiate the controller.

..note:

``params`` should contains the exact (case-sensitive) class name of the controller to instantiate.

Returns:	Instance of a given controller.

class miprometheus.models.controllers.FeedforwardController(params)[source]¶

A wrapper class for a feedforward controller.

__init__(params)[source]¶

Constructor.

Parameters:	params – Dictionary of parameters.

init_state(batch_size)[source]¶

Returns ‘zero’ (initial) state tuple - in this case empy tuple.

Parameters:	batch_size – Size of the batch in given iteraction/epoch.
Returns:	Initial state tuple - empty ().

forward(inputs_BxI, prev_state_tuple)[source]¶

Controller forward function.

Parameters:	inputs_BxI – a Tensor of input data of size [BATCH_SIZE x INPUT_SIZE] prev_state_tuple – unused - empty tuple ()
Returns:	outputs a Tensor of size [BATCH_SIZE x OUTPUT_SIZE] and empty tuple.

class miprometheus.models.controllers.FFGRUStateTuple[source]¶: Tuple used by gru Cells for storing current/past state information.

class miprometheus.models.controllers.FFGRUController(params)[source]¶

A wrapper class for a feedforward controller with a GRU cell.

__init__(params)[source]¶

Constructor.

Parameters:	params – Dictionary of parameters.

init_state(batch_size)[source]¶

Returns ‘zero’ (initial) state tuple.

Parameters:	batch_size – Size of the batch in given iteraction/epoch.
Returns:	Initial state tuple - object of GRUStateTuple class.

forward(x, prev_state_tuple)[source]¶

Controller forward function.

Parameters:	x – a Tensor of input data of size [BATCH_SIZE x INPUT_SIZE] (generally the read data and input word concatenated) prev_state_tuple – Tuple of the previous hidden and cell state
Returns:	outputs a Tensor of size [BATCH_SIZE x OUTPUT_SIZE] and an GRU state tuple.

class miprometheus.models.controllers.GRUStateTuple[source]¶: Tuple used by GRU Cells for storing current/past state information.

class miprometheus.models.controllers.GRUController(params)[source]¶

A wrapper class for a GRU cell-based controller.

__init__(params)[source]¶

Constructor.

Parameters:	params – Dictionary of parameters.

init_state(batch_size)[source]¶

Returns ‘zero’ (initial) state tuple.

Parameters:	batch_size – Size of the batch in given iteraction/epoch.
Returns:	Initial state tuple - object of GRUStateTuple class.
Returns:	Initial state tuple - object of GRUStateTuple class.

forward(x, prev_state_tuple)[source]¶

Controller forward function.

Parameters:	x – a Tensor of input data of size [BATCH_SIZE x INPUT_SIZE] (generally the read data and input word concatenated) prev_state_tuple – Tuple of the previous hidden and cell state
Returns:	outputs a Tensor of size [BATCH_SIZE x OUTPUT_SIZE] and an GRU state tuple.

class miprometheus.models.controllers.LSTMStateTuple[source]¶: Tuple used by LSTM Cells for storing current/past state information.

class miprometheus.models.controllers.LSTMController(params)[source]¶

A wrapper class for a LSTM-based controller.

__init__(params)[source]¶

Constructor.

Parameters:	params – Dictionary of parameters.

init_state(batch_size)[source]¶

Returns ‘zero’ (initial) state tuple.

Parameters:	batch_size – Size of the batch in given iteraction/epoch.
Returns:	Initial state tuple - object of LSTMStateTuple class.

forward(x, prev_state_tuple)[source]¶

Controller forward function.

Parameters:	x – a Tensor of input data of size [BATCH_SIZE x INPUT_SIZE] (generally the read data and input word concatenated) prev_state_tuple – Tuple of the previous hidden and cell state
Returns:	outputs a Tensor of size [BATCH_SIZE x OUTPUT_SIZE] and an LSTM state tuple.

class miprometheus.models.controllers.RNNStateTuple[source]¶: Tuple used by LSTM Cells for storing current/past state information.

class miprometheus.models.controllers.RNNController(params)[source]¶

A wrapper class for a feedforward controller?

TODO: Doc needs update!

__init__(params)[source]¶

Constructor for a RNN.

Parameters:	params – Dictionary of parameters.

init_state(batch_size)[source]¶

Returns ‘zero’ (initial) state tuple.

Parameters:	batch_size – Size of the batch in given iteraction/epoch.
Returns:	Initial state tuple - object of RNNStateTuple class.

forward(inputs, prev_hidden_state_tuple)[source]¶

Controller forward function.

Parameters:	inputs – a Tensor of input data of size [BATCH_SIZE x INPUT_SIZE] (generally the read data and input word concatenated) prev_state_tuple – Tuple of the previous hidden state
Returns:	outputs a Tensor of size [BATCH_SIZE x OUTPUT_SIZE] and an RNN state tuple.

Memory-Augmented Neural Network (MANN) models¶

DWM¶

class miprometheus.models.dwm.Controller(in_dim, output_units, state_units, read_size, update_size)[source]¶

Implementation of the DWM controller.

__init__(in_dim, output_units, state_units, read_size, update_size)[source]¶

Constructor for the Controller.

Parameters:	in_dim – input size. output_units – output size. state_units – state size. read_size – size of data_gen read from memory update_size – total number of parameters for updating attention and memory

init_state(batch_size)[source]¶

Returns ‘zero’ (initial) state tuple.

Parameters:	batch_size – size of the batch in given iteraction/epoch.
Returns:	Initial state tuple - object of LSTMStateTuple class.

forward(input, tuple_state_prev, read_data)[source]¶

Forward pass of the DWM controller, calculates the output, the hidden state and the interface parameters.

Parameters:	input – current input (from time t) [batch_size, in_dim] tuple_state_prev – contains previous hidden state (from time t-1) [batch_size, state_units] read_data – read data from memory (from time t) [batch_size, read_size]
Returns:	output: logits represent the prediction [batch_size, output_units]
Returns:	tuple_state: contains new_hidden_state
Returns:	update_data: interface parameters [batch_size, update_size]

class miprometheus.models.dwm.DWMCellStateTuple[source]¶

Tuple used by DWM Cells for storing current/past state information:

controller state, interface state, memory state.

class miprometheus.models.dwm.DWMCell(in_dim, output_units, state_units, num_heads, is_cam, num_shift, M)[source]¶

Applies the DWM cell to an element in the input sequence.

__init__(in_dim, output_units, state_units, num_heads, is_cam, num_shift, M)[source]¶

Builds the DWM cell.

Parameters:	in_dim – input size. output_units – output size. state_units – state size. num_heads – number of heads. is_cam – is it content_address able. num_shift – number of shifts of heads. M – Number of slots per address in the memory bank.

forward(input, tuple_cell_state_prev)[source]¶

forward pass of the DWM_Cell.

Parameters:	input – current input (from time t) [batch_size, inputs_size] tuple_cell_state_prev – contains (tuple_ctrl_state_prev, tuple_interface_prev, mem_prev), object of class DWMCellStateTuple
Returns:	output: logits [batch_size, output_size]
Returns:	tuple_cell_state: contains (tuple_ctrl_state, tuple_interface, mem)

\[ \begin{align}\begin{aligned}step1: read memory\\r_t &= M_t \otimes w_t\\step2: controller\\h_t &= \sigma(W_h[x_t,h_{t-1},r_{t-1}])\\y_t &= W_{y}[x_t,h_{t-1},r_{t-1}]\\P_t &= W_{P}[x_t,h_{t-1},r_{t-1}]\\step3: memory update\\M_t &= M_{t-1}\circ (E-w_t \otimes e_t)+w_t\otimes a_t\\to be completed ...\end{aligned}\end{align} \]

class miprometheus.models.dwm.DWM(params, problem_default_values_={})[source]¶

Differentiable Working Memory (DWM), is a memory augmented neural network which emulates the human working memory.

The DWM shows the same functional characteristics of working memory and robustly learns psychology-inspired tasks and converges faster than comparable state-of-the-art models

__init__(params, problem_default_values_={})[source]¶

Constructor. Initializes parameters on the basis of dictionary passed as argument.

Parameters:	params – Local view to the Parameter Regsitry ‘’model’’ section. problem_default_values – Dictionary containing key-values received from problem.

forward(data_dict)[source]¶

Forward function requires that the data_dict will contain at least “sequences”

Parameters:	data_dict – DataDict containing at least: - “sequences”: a tensor of input data of size [BATCH_SIZE x LENGTH_SIZE x INPUT_SIZE]
Returns:	output: logits which represent the prediction of DWM [batch, sequence_length, output_size]

Example:

>>> dwm = DWM(params)
>>> inputs = torch.randn(5, 3, 10)
>>> targets = torch.randn(5, 3, 20)
>>> data_tuple = (inputs, targets)
>>> output = dwm(data_tuple)

static generate_figure_layout()[source]¶: DOCUMENTATION!!

plot(data_dict, predictions, sample_number=0)[source]¶

Interactive visualization, with a slider enabling to move forth and back along the time axis (iteration in a given episode).

Parameters:	data_dict – DataDict containing at least: - “sequences”: a tensor of input data of size [BATCH_SIZE x LENGTH_SIZE x INPUT_SIZE] - “targets”: a tensor of targets of size [BATCH_SIZE x SEQUENCE_LENGTH x OUTPUT_DATA_SIZE] predictions – Prediction sequence [BATCH_SIZE x SEQUENCE_LENGTH x OUTPUT_DATA_SIZE] sample_number – Number of sample in batch (DEFAULT: 0)

class miprometheus.models.dwm.InterfaceStateTuple[source]¶

Tuple used by interface for storing current/past interface information:

head_weight and snapshot_weight.

class miprometheus.models.dwm.Interface(num_heads, is_cam, num_shift, M)[source]¶

Implementation of the interface of the DWM.

__init__(num_heads, is_cam, num_shift, M)[source]¶

Initialize Interface.

Parameters:	num_heads – number of heads (boolean) (is_cam) – are the heads allowed to use content addressing num_shift – number of shifts of heads. M – Number of slots per address in the memory bank.

init_state(memory_addresses_size, batch_size)[source]¶

Returns ‘zero’ (initial) state of Interface tuple.

Parameters:	batch_size – Size of the batch in given iteraction/epoch. memory_addresses_size – size of the memory
Returns:	Initial state tuple - object of InterfaceStateTuple class: (head_weight_init, snapshot_weight_init)

read_size¶

Returns the size of the data read by all heads.

Returns:	(num_head*content_size)

update_size¶

Returns the total number of parameters output by the controller.

Returns:	(num_heads*parameters_per_head)

read(wt, mem)[source]¶

Returns the data read from memory.

Parameters:	wt – head’s weights [batch_size, num_heads, memory_addresses_size] mem – the memory content [batch_size, memory_content_size, memory_addresses_size]
Returns:	the read data [batch_size, num_heads, memory_content_size]

update(update_data, tuple_interface_prev, mem)[source]¶

Erases from memory, writes to memory, updates the weights using various attention mechanisms.

Parameters:	update_data – the parameters from the controllers tuple_interface_prev – contains (head_weight, snapshot_weight) tuple_interface_prev.head_weight – head attention [batch_size, num_heads, memory_size] tuple_interface_prev.snapshot_weight – snapshot(bookmark) attention [batch_size, num_heads, memory_size] mem – the memory [batch_size, content_size, memory_size]
Returns:	InterfaceTuple contains [head_weight, snapshot_weight]: the updated weight of head and snapshot
Returns:	mem: the new memory content

class miprometheus.models.dwm.Memory(mem_t)[source]¶

Implementation of the memory of the DWM.

__init__(mem_t)[source]¶

Initializes the memory.

Parameters:	mem_t – the memory at time t [batch_size, memory_content_size, memory_addresses_size]

attention_read(wt)[source]¶

Returns the data read from memory.

Parameters:	wt – head’s weights [batch_size, num_heads, memory_addresses_size]
Returns:	the read data [batch_size, num_heads, memory_content_size]

add_weighted(add, wt)[source]¶

Writes data to memory.

Parameters:	wt – head’s weights [batch_size, num_heads, memory_addresses_size] add – the data to be added to memory [batch_size, num_heads, memory_content_size]

:return the updated memory [batch_size, memory_addresses_size, memory_content_size]

erase_weighted(erase, wt)[source]¶

Erases elements from memory.

Parameters:	wt – head’s weights [batch_size, num_heads, memory_addresses_size] erase – data to be erased from memory [batch_size, num_heads, memory_content_size]

:return the updated memory [batch_size, memory_addresses_size, memory_content_size]

content_similarity(k)[source]¶

Calculates the dot product for Content aware addressing.

Parameters:	k – the keys emitted by the controller [batch_size, num_heads, memory_content_size]
Returns:	the dot product between the keys and query [batch_size, num_heads, memory_addresses_size]

size¶

Returns the size of the memory.

Returns:	Int size of the memory

content¶

Returns the entire memory.

Returns:	the memory []

miprometheus.models.dwm.normalize(x)[source]¶

Normalizes the input torch tensor along the last dimension using the max of the one norm The normalization is “fuzzy” to prevent divergences.

Parameters:	x – input of shape [batch_size, A, A1 ..An] if the input is the weight vector x’sahpe (batch_size, num_heads, memory_size)
Returns:	normalized x of shape [batch_size, A, A1 ..An]

miprometheus.models.dwm.sim(query, data, l2_normalize=False, aligned=True)[source]¶

Batch dot-product similarity computed using matrix multiplication the hidden shapes must be broadcastable (numpy style)

Parameters:

query – the input data to be compared [batch_size, h, p] p = memory_size if aligned is True and p = content_size if aligned is False
data – Input state [batch_size, content_size, memory_size]
l2_normalize – boolean, determines where to normalize the query and the data before the dot product
aligned – boolean, determines whether to transpose data along the last two dimensions

Returns:

out[…,i,j] = sum_k q[…,i,k] * data_gen[…,j,k] for the default options

miprometheus.models.dwm.outer_prod(x, y)[source]¶

Batch outer product of two vectors (along the last two dimensions) the hidden shapes must be broadcastable (numpy style)

Parameters:	x – (for the dwm model) input one [batch_size, num_heads, memory_content_size] y – (for the dwm model) Input two [batch_size, num_heads, memory_addresses_size]
Returns:	Outer product [batch_size, num_heads, memory_content_size, memory_addresses_size]

miprometheus.models.dwm.circular_conv(x, f)[source]¶

Batch 1D circular convolution with matching hidden shapes.

Parameters:	x – input [batch_size, num_head, num_addresses] f – shift array [batch_size, num_heads, shift_size]
Returns:	Circular convolution [batch_size, num_head, num_addresses]

DNC¶

class miprometheus.models.dnc.ControlParams(output_size, read_size, params)[source]¶

__init__(output_size, read_size, params)[source]¶

Initialize an Controller.

Parameters:	output_size – output size. read_size – size of data_gen read from memory params – dictionary of input parameters

init_state(batch_size)[source]¶

Returns ‘zero’ (initial) state tuple.

Parameters:	batch_size – Size of the batch in given iteraction/epoch.
Returns:	Initial state tuple - object of LSTMStateTuple class.

forward(inputs, prev_ctrl_state_tuple, read_data)[source]¶

Calculates the output, the hidden state and the controller parameters.

Parameters:	inputs – Current input (from time t) [BATCH_SIZE x INPUT_SIZE] read_data – data read from memory (from time t-1) [BATCH_SIZE x num_data_bits] prev_ctrl_state_tuple – Tuple of states of controller (from time t-1)
Returns:	Tuple [output, hidden_state, update_data] (update_data contains all of the controller parameters)

class miprometheus.models.dnc.NTMCellStateTuple[source]¶: Tuple used by NTM Cells for storing current/past state information.

class miprometheus.models.dnc.DNCCell(output_size, params)[source]¶

Class representing a single cell of the DNC.

__init__(output_size, params)[source]¶

Initialize an DNC cell.

Parameters:	output_size – output size. state_units – state size. num_heads – number of heads.

init_state(memory_address_size, batch_size)[source]¶

Returns ‘zero’ (initial) state:

memory is reset to random values.
read & write weights (and read vector) are set to 1e-6.

Parameters:	batch_size – Size of the batch in given iteraction/epoch. num_memory_adresses – Number of memory addresses.

forward(input_BxI, cell_state_prev)[source]¶

Builds the DNC cell.

Parameters:	input – Current input (from time t) [BATCH_SIZE x INPUT_SIZE] state – Previous hidden state (from time t-1) [BATCH_SIZE x STATE_UNITS]
Returns:	Tuple [output, hidden_state]

class miprometheus.models.dnc.DNC(params, problem_default_values_={})[source]¶

Implementation of Differentiable Neural Computer (DNC)

Graves, Alex, et al. “Hybrid computing using a neural network with dynamic external memory.” Nature 538.7626 (2016): 471. doi:10.1038/nature20101

__init__(params, problem_default_values_={})[source]¶

Constructor. Initializes parameters on the basis of dictionary passed as argument.

Parameters:	params – Local view to the Parameter Regsitry ‘’model’’ section. problem_default_values – Dictionary containing key-values received from problem.

forward(data_dict)[source]¶

Forward function requires that the data_dict will contain at least “sequences”

Parameters:	data_dict – DataDict containing at least: - “sequences”: a tensor of input data of size [BATCH_SIZE x LENGTH_SIZE x INPUT_SIZE]
Returns:	Predictions (logits) being a tensor of size [BATCH_SIZE x LENGTH_SIZE x OUTPUT_SIZE].

plot_memory_attention(data_dict, predictions, sample_number=0)[source]¶

Plots memory and attention TODO: fix.

Parameters:	data_dict – DataDict containing at least: - “sequences”: a tensor of input data of size [BATCH_SIZE x LENGTH_SIZE x INPUT_SIZE] - “targets”: a tensor of targets of size [BATCH_SIZE x SEQUENCE_LENGTH x OUTPUT_DATA_SIZE] predictions – Prediction sequence [BATCH_SIZE x SEQUENCE_LENGTH x OUTPUT_DATA_SIZE] sample_number – Number of sample in batch (DEFAULT: 0)

generate_figure_layout()[source]¶: DOCUMENTATION! :return:

plot(data_dict, predictions, sample_number=0)[source]¶

Interactive visualization, with a slider enabling to move forth and back along the time axis (iteration in a given episode).

Parameters:	data_dict – DataDict containing at least: - “sequences”: a tensor of input data of size [BATCH_SIZE x LENGTH_SIZE x INPUT_SIZE] - “targets”: a tensor of targets of size [BATCH_SIZE x SEQUENCE_LENGTH x OUTPUT_DATA_SIZE] predictions – Prediction sequence [BATCH_SIZE x SEQUENCE_LENGTH x OUTPUT_DATA_SIZE] sample_number – Number of sample in batch (DEFAULT: 0)

class miprometheus.models.dnc.InterfaceStateTuple[source]¶: Tuple used by interface for storing current/past state information.

class miprometheus.models.dnc.Interface(params)[source]¶

__init__(params)[source]¶

Initialize Interface.

Parameters:	params – dictionary of input parameters

read_size¶

Returns the size of the data read by all heads.

Returns:	(num_head*content_size)

read(prev_interface_tuple, mem)[source]¶

returns the data read from memory.

Parameters:	prev_interface_tuple – Tuple [previous read, previous write, prev usage, prev links] mem – the memory [batch_size, content_size, memory_size]
Returns:	the read data [batch_size, content_size]

edit_memory(interface_tuple, update_data, mem)[source]¶

Edits the external memory and then returns it.

Parameters:	update_data – the parameters from the controllers [dictionary] prev_interface_tuple – Tuple [previous read, previous write, prev usage, prev links] mem – the memory [batch_size, content_size, memory_size]
Returns:	edited memory [batch_size, content_size, memory_size]

init_state(memory_address_size, batch_size)[source]¶

Returns ‘zero’ (initial) state tuple.

Parameters:	memory_address_size – The number of memory addresses batch_size – Size of the batch in given iteraction/epoch.
Returns:	Initial state tuple - object of InterfaceStateTuple class.

update_weight(prev_attention, memory, strength, gate, key, shift, sharp)[source]¶

Update the attention with NTM’s mix of content addressing and linear shifting.

Parameters:

prev_attention – tensor of shape [batch_size, num_writes, memory_size] giving the attention at the previous time step.
memory – the memory of the previous step (class)
strength – The strengthening parameter for the content addressing [batch, num_heads, 1]
gate – The interpolation gate between the content addressing and the previous weight [batch, num_heads, 1]
key – The comparison key for the content addressing [batch, num_heads, num_memory_bits]
shift – The shift vector that defines the circular convolution of the outputs [batch, num_heads, num_shifts]
sharp – sharpening parameter for the attention [batch, num_heads, 1]

update_write_weight(usage, memory, allocation_gate, write_gate, key, strength)[source]¶

Update write attention with DNC’s combination of content addressing and usage based allocation.

Parameters:

usage – A tensor of shape [batch_size, memory_size] representing current memory usage.
memory – the memory of the previous step (class)
strength – The strengthening parameter for the content addressing [batch, num_writes, 1]
key – The comparison key for the content addressing [batch, num_writes, num_memory_bits]
allocation_gate – Interpolation between writing to unallocated memory and content-based lookup, for each write head [batch, num_writes, 1]
write_gate – Overall gating of write amount for each write head. [batch, num_writes, 1]

update_read_weight(link, memory, prev_read_weights, read_mode, key, strength)[source]¶

Update the read attention with the DNC’s combination of content addressing and temporal link propagation to go forwards or backwards in time.

Parameters:

link – A tensor of shape [batch_size, num_writes, memory_size, memory_size] representing the previous link graphs for each write head.
memory – the memory of the previous step (class)
prev_read_weights – tensor of shape [batch_size, num_reads, memory_size] containing the previous read weights w_{t-1}^r.
read_mode – Mixing between “backwards” and “forwards” positions (for each write head) and content-based lookup, for each read head [batch, num_reads, 1+2*numwrites]
strength – The strengthening parameter for the content addressing [batch, num_reads, 1]
key – The comparison key for the content addressing [batch, num_reads, num_memory_bits]

update_read(update_data, prev_interface_tuple, mem)[source]¶

Updates the read attention switching between the NTM and DNC mechanisms.

Parameters:	update_data – the parameters from the controllers [dictionary] prev_interface_tuple – Tuple [previous read, previous write, prev usage, prev links[ prev_memory_BxMxA – the memory of the previous step (class)
Returns:	The new interface tuple with an updated usage and write attention

update_write(update_data, prev_interface_tuple, mem)[source]¶

Updates the write attention switching between the NTM and DNC mechanisms.

Parameters:	update_data – the parameters from the controllers [dictionary] prev_interface_tuple – Tuple [previous read, previous write, prev usage, prev links] prev_memory_BxMxA – the memory of the previous step (class)
Returns:	The new interface tuple with an updated usage and write attention

update_and_edit(update_data, prev_interface_tuple, prev_memory_BxMxA)[source]¶

Erases from memory, writes to memory, updates the weights using various attention mechanisms.

Parameters:	update_data – the parameters from the controllers [update_size] prev_interface_tuple – the read weight [BATCH_SIZE, MEMORY_SIZE] prev_memory_BxMxA – the memory of the previous step (class)
Returns:	the new read vector, the update memory, the new interface tuple

class miprometheus.models.dnc.Memory(mem_t)[source]¶

__init__(mem_t)[source]¶

Initializes the memory.

Parameters:	of shape (batch_size, memory_content_size, memory_addresses_size) (mem_t) – the memory at time t

attention_read(wt)[source]¶

Returns the data read from memory.

:param wt of shape (batch_size, num_heads, memory_addresses_size) : head’s weights :return: the read data of shape (batch_size, num_heads, memory_content_size)

add_weighted(add, wt)[source]¶

Writes data to memory.

:param wt of shape (batch_size, num_heads, memory_addresses_size) : head’s weights :param add of shape (batch_size, num_heads, memory_content_size) : the data to be added to memory

:return the updated memory of shape (batch_size, memory_addresses_size, memory_content_size)

erase_weighted(erase, wt)[source]¶

Erases elements from memory.

:param wt of shape (batch_size, num_heads, memory_addresses_size) : head’s weights :param erase of shape (batch_size, num_heads, memory_content_size) : data to be erased from memory

:return the updated memory of shape (batch_size, memory_addresses_size, memory_content_size)

content_similarity(k)[source]¶

Calculates the dot product for Content aware addressing.

Parameters:	of shape (batch_size, num_heads, memory_content_size) (k) – the keys emitted by the controller
Returns:	the dot product between the keys and query of shape (batch_size, num_heads, memory_addresses_size)

size¶

Returns the size of the memory.

Returns:	Int size of the memory

content¶

Returns the entire memory.

Returns:	the memory []

class miprometheus.models.dnc.MemoryUsage(name='MemoryUsage')[source]¶

Memory usage that is increased by writing and decreased by reading.

This module has a state is a tensor with values in the range [0, 1] indicating the usage of each of memory_size memory slots.

The usage is:

Increased by writing, where usage is increased towards 1 at the write addresses.
Decreased by reading, where usage is decreased after reading from a location when free_gate is close to 1.

The function write_allocation_weights can be invoked to get free locations to write to for a number of write heads.

__init__(name='MemoryUsage')[source]¶

Creates a MemoryUsages module.

Parameters:	name – Name of the module.

init_state(memory_address_size, batch_size)[source]¶

Returns ‘zero’ (initial) state tuple.

Parameters:	batch_size – Size of the batch in given iteraction/epoch.
Returns:	Initial state tuple - object of InterfaceStateTuple class.

calculate_usage(write_weights, free_gate, read_weights, prev_usage)[source]¶

Calculates the new memory usage u_t.

Memory that was written to in the previous time step will have its usage increased; memory that was read from and the controller says can be “freed” will have its usage decreased.

Parameters:

write_weights – tensor of shape [batch_size, num_writes, memory_size] giving write weights at previous time step.
free_gate – tensor of shape [batch_size, num_reads] which indicates which read heads read memory that can now be freed.
read_weights – tensor of shape [batch_size, num_reads, memory_size] giving read weights at previous time step.
prev_usage – tensor of shape [batch_size, memory_size] giving usage u_{t - 1} at the previous time step, with entries in range [0, 1].

Returns:

tensor of shape [batch_size, memory_size] representing updated memory usage.

write_allocation_weights(usage, write_gates, num_writes)[source]¶

Calculates freeness-based locations for writing to.

This finds unused memory by ranking the memory locations by usage, for each write head. (For more than one write head, we use a “simulated new usage” which takes into account the fact that the previous write head will increase the usage in that area of the memory.)

Parameters:	usage – A tensor of shape [batch_size, memory_size] representing current memory usage. write_gates – A tensor of shape [batch_size, num_writes] with values in the range [0, 1] indicating how much each write head does writing based on the address returned here (and hence how much usage increases). num_writes – The number of write heads to calculate write weights for.
Returns:	tensor of shape [batch_size, num_writes, memory_size] containing the freeness-based write locations. Note that this isn’t scaled by write_gate; this scaling must be applied externally.

exclusive_cumprod_temp(sorted_usage, dim=1)[source]¶

Applies the exclusive cumultative product (at the moment it assumes the shape of the input)

Parameters:	sorted_usage – tensor of shape [batch_size, memory_size] indicating current memory usage sorted in ascending order.
Returns:	Tensor of shape [batch_size, memory_size] that is exclusive pruduct of the sorted usage i.e. = [1, u1, u1u2, u1u2*u3, ….]

state_size¶: Returns the shape of the state tensor.

class miprometheus.models.dnc.Param_Generator(param_in_dim, word_size=20, num_reads=1, num_writes=1, shift_size=3)[source]¶

__init__(param_in_dim, word_size=20, num_reads=1, num_writes=1, shift_size=3)[source]¶

Initialize all the parameters of the interface.

Parameters:	param_in_dim – input size. (typically the size of the hidden state) word_size – size of the word in memory num_reads – number of read heads num_writes – number of write heads shift_size – size of the shift vector (3 means it can go forward, backward and remain in place)

forward(vals)[source]¶

Calculates the controller parameters.

Return update_data:
Parameters:	vals – data from the controller (from time t). Typically, the hidden state. [BATCH_SIZE x INPUT_SIZE]
	dictionary (update_data contains all of the controller parameters)

class miprometheus.models.dnc.TemporalLinkageState[source]¶: Tuple used by interface for storing current/past state information.

class miprometheus.models.dnc.TemporalLinkage(num_writes, name='temporal_linkage')[source]¶

Keeps track of write order for forward and backward addressing. This is a pseudo-RNNCore module, whose state is a pair (link, precedence_weights), where link is a (collection of) graphs for (possibly multiple) write heads (represented by a tensor with values in the range.

[0, 1]), and precedence_weights records the “previous write locations” used to build the link graphs. The function directional_read_weights computes addresses following the forward and backward directions in the link graphs.

__init__(num_writes, name='temporal_linkage')[source]¶

Construct a TemporalLinkage module. Args:

Parameters:	memory_size – The number of memory slots. num_writes – The number of write heads. name – Name of the module.

init_state(memory_address_size, batch_size)[source]¶

Returns ‘zero’ (initial) state tuple.

Parameters:	batch_size – Size of the batch in given iteraction/epoch.
Returns:	Initial state tuple - object of InterfaceStateTuple class.

calc_temporal_links(write_weights, prev_state)[source]¶

Calculate the updated linkage state given the write weights.

:param : param write_weights: A tensor of shape [batch_size, num_writes, memory_size]: containing the memory addresses of the different write heads.
:param : param prev_state: TemporalLinkageState tuple containg a tensor link of: shape [batch_size, num_writes, memory_size, memory_size], and a tensor precedence_weights of shape [batch_size, num_writes, memory_size] containing the aggregated history of recent writes.

Returns:	returns: A TemporalLinkageState tuple next_state, which contains the updated link and precedence weights.

directional_read_weights(link, prev_read_weights, forward)[source]¶

Calculates the forward or the backward read weights.

For each read head (at a given address), there are num_writes link graphs to follow. Thus this function computes a read address for each of the num_reads * num_writes pairs of read and write heads.

:param : param link: tensor of shape [batch_size, num_writes, memory_size, memory_size] representing the link graphs L_t. :param : param prev_read_weights: tensor of shape [batch_size, num_reads, memory_size] containing the previous read weights w_{t-1}^r. :param : param forward: Boolean indicating whether to follow the “future” direction in the link graph (True) or the “past” direction (False).

Returns:	returns: tensor of shape [batch_size, num_reads, num_writes, memory_size]

miprometheus.models.dnc.normalize(x)[source]¶

Normalizes the input torch tensor along the last dimension using the max of the one norm The normalization is “fuzzy” to prevent divergences.

Parameters:	x – input of shape (batch_size, A, A1 ..An) if the input is the weight vector x’sahpe (batch_size, num_heads, memory_size)
Returns:	normalized x of shape (batch_size, A, A1 ..An)

miprometheus.models.dnc.sim(query, data, l2_normalize=False, aligned=True)[source]¶

Batch dot-product similarity computed using matrix multiplication the hidden shapes must be broadcastable (numpy style)

Parameters:

query – the input data to be compared (batch_size, h, p) p = memory_size if aligned is True and p = content_size if aligned is False
data – Input state (batch_size, content_size, memory_size]
l2_normalize – boolean, determines where to normalize the query and the data before the dot product
aligned – boolean, determines whether to transpose data along the last two dimensions

Returns:

out[…,i,j] = sum_k q[…,i,k] * data_gen[…,j,k] for the default options

miprometheus.models.dnc.outer_prod(x, y)[source]¶

Batch outer product of two vectors (along the last two dimensions) the hidden shapes must be broadcastable (numpy style)

Parameters:	x – (the dwm model) input one (batch_size, num_heads, memory_content_size) y – (the dwm model) Input two (batch_size, num_heads, memory_addresses_size)
Returns:	Outer product (batch_size, num_heads, memory_content_size, memory_addresses_size)

miprometheus.models.dnc.circular_conv(x, f)[source]¶

Batch 1D circular convolution with matching hidden shapes.

Parameters:	x – input of shape (batch_size, num_head, num_addresses) f – shift array (batch_size, num_heads, shift_size)
Returns:	Circular convolution (batch_size, num_head, num_addresses)

NTM¶

class miprometheus.models.ntm.NTMCellStateTuple[source]¶: Tuple used by NTM Cells for storing current/past state information.

class miprometheus.models.ntm.NTMCell(params)[source]¶

Class representing a single NTM cell.

__init__(params)[source]¶

Cell constructor. Cell creates controller and interface. It also initializes memory “block” that will be passed between states.

Parameters:	params – Dictionary of parameters.

init_state(init_memory_BxAxC)[source]¶

Returns ‘zero’ (initial) state. “Recursivelly” calls controller and interface initialization.

Parameters:	init_memory_BxAxC – Initial memory.
Returns:	Initial state tuple - object of NTMCellStateTuple class.

forward(inputs_BxI, prev_cell_state)[source]¶

Forward function of NTM cell.

Parameters:	inputs_BxI – a Tensor of input data of size [BATCH_SIZE x INPUT_SIZE] prev_cell_state – a NTMCellStateTuple tuple, containing previous state of the cell.
Returns:	an output Tensor of size [BATCH_SIZE x OUTPUT_SIZE] and NTMCellStateTuple tuple containing current cell state.

class miprometheus.models.ntm.HeadStateTuple[source]¶: Tuple used by interface for storing current/past state information.

class miprometheus.models.ntm.InterfaceStateTuple[source]¶: Tuple used by interface for storing current/past state information.

class miprometheus.models.ntm.NTMInterface(params)[source]¶

Class realizing interface between controller and memory.

__init__(params)[source]¶

Constructor.

Parameters:	params – Dictionary of parameters.

init_state(batch_size, num_memory_addresses)[source]¶

Returns ‘zero’ (initial) state tuple.

Parameters:	batch_size – Size of the batch in given iteraction/epoch. num_memory_addresses – Number of memory addresses.
Returns:	Initial state tuple - object of InterfaceStateTuple class.

forward(ctrl_hidden_state_BxH, prev_memory_BxAxC, prev_interface_state_tuple)[source]¶

Controller forward function.

Parameters:	ctrl_hidden_state_BxH – a Tensor with controller hidden state of size [BATCH_SIZE x HIDDEN_SIZE] prev_memory_BxAxC – Previous state of the memory [BATCH_SIZE x MEMORY_ADDRESSES x CONTENT_BITS] prev_interface_state_tuple – Tuple containing previous read and write attention vectors.
Returns:	List of read vectors [BATCH_SIZE x CONTENT_SIZE], updated memory and state tuple (object of LSTMStateTuple class).

calculate_param_locations(param_sizes_dict, head_name)[source]¶

Calculates locations of parameters, that will subsequently be used during parameter splitting.

Parameters:	param_sizes_dict – Dictionary containing parameters along with their sizes (in bits/units). head_name – Name of head.
Returns:	“Locations” of parameters.

split_params(params, locations)[source]¶: Split parameters into list on the basis of locations.

update_attention(query_vector_BxC, beta_Bx1, gate_Bx1, shift_BxS, gamma_Bx1, prev_memory_BxAxC, prev_attention_BxAx1)[source]¶

Updates the attention weights.

Parameters:

query_vector_BxC – Query used for similarity calculation in content-based addressing [BATCH_SIZE x CONTENT_BITS]
beta_Bx1 – Strength parameter used in content-based addressing.
gate_Bx1 –
shift_BxS –
gamma_Bx1 –
prev_memory_BxAxC – tensor containing memory before update [BATCH_SIZE x MEMORY_ADDRESSES x CONTENT_BITS]
prev_attention_BxAx1 – previous attention vector [BATCH_SIZE x MEMORY_ADDRESSES x 1]

Returns:

attention vector of size [BATCH_SIZE x ADDRESS_SIZE x 1]

content_based_addressing(query_vector_Bx1xC, beta_Bx1x1, prev_memory_BxAxC)[source]¶

Computes content-based addressing. Uses query vectors for calculation of similarity.

Parameters:	query_vector_Bx1xC – NTM “key” [BATCH_SIZE x 1 x CONTENT_BITS] beta_Bx1x1 – key strength [BATCH_SIZE x 1 x 1] prev_memory_BxAxC – tensor containing memory before update [BATCH_SIZE x MEMORY_ADDRESSES x CONTENT_BITS]
Returns:	attention of size [BATCH_SIZE x ADDRESS_SIZE x 1]

location_based_addressing(attention_BxAx1, shift_BxSx1, gamma_Bx1x1)[source]¶

Computes location-based addressing, i.e. shitfts the head and sharpens.

Parameters:	attention_BxAx1 – Current attention [BATCH_SIZE x ADDRESS_SIZE x 1] shift_BxSx1 – soft shift maks (convolutional kernel) [BATCH_SIZE x SHIFT_SIZE x 1] gamma_Bx1x1 – sharpening factor [BATCH_SIZE x 1 x 1]
Returns:	attention vector of size [BATCH_SIZE x ADDRESS_SIZE x 1]

circular_convolution(attention_BxAx1, shift_BxSx1)[source]¶

Performs circular convolution, i.e. shitfts the attention accodring to given shift vector (convolution mask).

Parameters:	attention_BxAx1 – Current attention [BATCH_SIZE x ADDRESS_SIZE x 1] shift_BxSx1 – soft shift maks (convolutional kernel) [BATCH_SIZE x SHIFT_SIZE x 1]
Returns:	attention vector of size [BATCH_SIZE x ADDRESS_SIZE x 1]

sharpening(attention_BxAx1, gamma_Bx1x1)[source]¶

Performs attention sharpening.

Parameters:	attention_BxAx1 – Current attention [BATCH_SIZE x ADDRESS_SIZE x 1] gamma_Bx1x1 – sharpening factor [BATCH_SIZE x 1 x 1]
Returns:	attention vector of size [BATCH_SIZE x ADDRESS_SIZE x 1]

read_from_memory(attention_BxAx1, memory_BxAxC)[source]¶

Returns 2D tensor of size [BATCH_SIZE x CONTENT_BITS] storing vector read from memory given the attention.

Parameters:	attention_BxAx1 – Current attention [BATCH_SIZE x ADDRESS_SIZE x 1] memory_BxAxC – tensor containing memory [BATCH_SIZE x MEMORY_ADDRESSES x CONTENT_BITS]
Returns:	vector read from the memory [BATCH_SIZE x CONTENT_BITS]

update_memory(write_attention_BxAx1, erase_vector_Bx1xC, add_vector_Bx1xC, prev_memory_BxAxC)[source]¶

Returns 3D tensor of size [BATCH_SIZE x MEMORY_ADDRESSES x CONTENT_BITS] storing new content of the memory.

Parameters:	write_attention_BxAx1 – Current write attention [BATCH_SIZE x ADDRESS_SIZE x 1] erase_vector_Bx1xC – Erase vector [BATCH_SIZE x 1 x CONTENT_BITS] add_vector_Bx1xC – Add vector [BATCH_SIZE x 1 x CONTENT_BITS] prev_memory_BxAxC – tensor containing previous state of the memory [BATCH_SIZE x MEMORY_ADDRESSES x CONTENT_BITS]
Returns:	vector read from the memory [BATCH_SIZE x CONTENT_BITS]

class miprometheus.models.ntm.NTM(params, problem_default_values_={})[source]¶

Class representing the Neural Turing Machine module.

__init__(params, problem_default_values_={})[source]¶

Constructor. Initializes parameters on the basis of dictionary passed as argument.

Parameters:	params – Local view to the Parameter Regsitry ‘’model’’ section. problem_default_values – Dictionary containing key-values received from problem.

forward(data_dict)[source]¶

Forward function requires that the data_dict will contain at least “sequences”

Parameters:	data_dict – DataDict containing at least: - “sequences”: a tensor of input data of size [BATCH_SIZE x LENGTH_SIZE x INPUT_SIZE]
Returns:	Predictions (logits) being a tensor of size [BATCH_SIZE x LENGTH_SIZE x OUTPUT_SIZE].

generate_memory_attention_figure_layout()[source]¶

Creates a figure template for showing basic NTM attributes (write & write attentions), memory and sequence (inputs, predictions and targets).

Returns:	Matplot figure object.

plot_memory_attention_sequence(data_dict, predictions, sample_number=0)[source]¶

Creates list of figures used in interactive visualization, with a slider enabling to move forth and back along the time axis (iteration in a given episode). The visualization presents input, output and target sequences passed as input parameters. Additionally, it utilizes state tuples collected during the experiment for displaying the memory state, read and write attentions.

Parameters:	data_dict – DataDict containing at least: - “sequences”: a tensor of input data of size [BATCH_SIZE x LENGTH_SIZE x INPUT_SIZE] - “targets”: a tensor of targets of size [BATCH_SIZE x SEQUENCE_LENGTH x OUTPUT_DATA_SIZE] predictions – Prediction sequence [BATCH_SIZE x SEQUENCE_LENGTH x OUTPUT_DATA_SIZE] sample_number – Number of sample in batch (DEFAULT: 0)

generate_memory_all_model_params_figure_layout()[source]¶

Creates a figure template for showing all NTM attributes (write & write attentions, gates, convolution masks), along with memory and sequence (inputs, predictions and targets).

Returns:	Matplot figure object.

plot_memory_all_model_params_sequence(data_dict, predictions, sample_number=0)[source]¶

Creates list of figures used in interactive visualization, with a slider enabling to move forth and back along the time axis (iteration in a given episode). The visualization presents input, output and target sequences passed as input parameters. Additionally, it utilizes state tuples collected during the experiment for displaying the memory state, read and write attentions; and gating params.

Parameters:	data_dict – DataDict containing at least: - “sequences”: a tensor of input data of size [BATCH_SIZE x LENGTH_SIZE x INPUT_SIZE] - “targets”: a tensor of targets of size [BATCH_SIZE x SEQUENCE_LENGTH x OUTPUT_DATA_SIZE] predictions – Prediction sequence [BATCH_SIZE x SEQUENCE_LENGTH x OUTPUT_DATA_SIZE] sample_number – Number of sample in batch (DEFAULT: 0)

Encoder-Solver models¶

class miprometheus.models.encoder_solver.EncoderSolverLSTM(params, problem_default_values_={})[source]¶

Class representing the Encoder-Solver architecture using LSTM cells as both encoder and solver modules.

__init__(params, problem_default_values_={})[source]¶

Constructor. Initializes parameters on the basis of dictionary passed as argument.

Parameters:	params – Local view to the Parameter Regsitry ‘’model’’ section. problem_default_values – Dictionary containing key-values received from problem.

init_state(batch_size)[source]¶

Returns ‘zero’ (initial) state.

Parameters:	batch_size – Size of the batch in given iteraction/epoch.
Returns:	Initial state tuple (hidden, memory cell).

forward(data_dict)[source]¶

Forward function requires that the data_dict will contain at least “sequences”

Parameters:	data_dict – DataDict containing at least: - “sequences”: a tensor of input data of size [BATCH_SIZE x LENGTH_SIZE x INPUT_SIZE]
Returns:	Predictions (logits) being a tensor of size [BATCH_SIZE x LENGTH_SIZE x OUTPUT_SIZE].

class miprometheus.models.encoder_solver.EncoderSolverNTM(params, problem_default_values_={})[source]¶

Class implementing the Encoder-Solver NTM model. The model has two NTM cells, that are used in two distinctive modes.

__init__(params, problem_default_values_={})[source]¶

Constructor. Initializes parameters on the basis of dictionary passed as argument.

Parameters:	params – Local view to the Parameter Regsitry ‘’model’’ section. problem_default_values – Dictionary containing key-values received from problem.

forward(data_dict)[source]¶

Forward function requires that the data_dict will contain at least “sequences”

Parameters:	data_dict – DataDict containing at least: - “sequences”: a tensor of input data of size [BATCH_SIZE x LENGTH_SIZE x INPUT_SIZE]
Returns:	Predictions (logits) being a tensor of size [BATCH_SIZE x LENGTH_SIZE x OUTPUT_SIZE].

class miprometheus.models.encoder_solver.MAECellStateTuple[source]¶: Tuple used by MAE Cells for storing current/past state information.

class miprometheus.models.encoder_solver.MAECell(params)[source]¶

Class representing a single Memory-Augmented Encoder cell.

__init__(params)[source]¶

Cell constructor. Cell creates controller and interface. It also initializes memory “block” that will be passed between states.

Parameters:	params – Dictionary of parameters.

save(model_dir, stat_obj, is_best_model, save_intermediate)[source]¶

Method saves the model and encoder to file.

Parma save_intermediate:
Parameters:	model_dir – Directory where the model will be saved. stat_obj – Statistics object (collector or aggregator) that contain current loss and episode number (and other statistics). is_best_model – Flag indicating whether it is the best model or not.
	Flag indicating whether intermediate models should be saved or not.

freeze()[source]¶: Freezes the trainable weigths.

init_state(init_memory_BxAxC)[source]¶

Initializes state of MAE cell. Recursively initialization: controller, interface.

Parameters:	init_memory_BxAxC – Initial memory state [BATCH_SIZE x MEMORY_ADDRESSES x MEMORY_CONTENT].
Returns:	Initial state tuple - object of NTMCellStateTuple class.

forward(inputs_BxI, prev_cell_state)[source]¶

Forward function of NTM cell.

Parameters:	inputs_BxI – a Tensor of input data of size [BATCH_SIZE x INPUT_SIZE] prev_cell_state – a MAECellStateTuple tuple, containing previous state of the cell.
Returns:	MAECellStateTuple tuple containing current cell state.

class miprometheus.models.encoder_solver.MAEInterfaceStateTuple[source]¶: Tuple used by interface for storing current/past MAE interface state information.

class miprometheus.models.encoder_solver.MAEInterface(params)[source]¶

Class realizing interface between controller and memory in Memory Augmented Encoder cell.

__init__(params)[source]¶

Constructor.

Parameters:	params – Dictionary of parameters.

freeze()[source]¶: Freezes the trainable weigths.

init_state(batch_size, num_memory_addresses)[source]¶

Returns ‘zero’ (initial) state tuple.

Parameters:	batch_size – Size of the batch in given iteraction/epoch. num_memory_addresses – Number of memory addresses.
Returns:	Initial state tuple - object of InterfaceStateTuple class.

forward(ctrl_hidden_state_BxH, prev_memory_BxAxC, prev_interface_state_tuple)[source]¶

Controller forward function.

Parameters:	ctrl_hidden_state_BxH – a Tensor with controller hidden state of size [BATCH_SIZE x HIDDEN_SIZE] prev_memory_BxAxC – Previous state of the memory [BATCH_SIZE x MEMORY_ADDRESSES x CONTENT_BITS] prev_interface_state_tuple – Tuple containing previous interface tuple.
Returns:	updated memory and state tuple (object of MAEInterfaceStateTuple class).

calculate_param_locations(param_sizes_dict, head_name)[source]¶

Calculates locations of parameters, that will subsequently be used during parameter splitting.

Parameters:	param_sizes_dict – Dictionary containing parameters along with their sizes (in bits/units). head_name – Name of head.
Returns:	“Locations” of parameters.

split_params(params, locations)[source]¶: Split parameters into list on the basis of locations.

update_attention(shift_BxS, gamma_Bx1, prev_memory_BxAxC, prev_attention_BxAx1)[source]¶

Updates the attention weights.

Parameters:	shift_BxS – Convolution shift gamma_Bx1 – Sharpening factor prev_memory_BxAxC – tensor containing memory before update [BATCH_SIZE x MEMORY_ADDRESSES x CONTENT_BITS] prev_attention_BxAx1 – previous attention vector [BATCH_SIZE x MEMORY_ADDRESSES x 1]
Returns:	attention vector of size [BATCH_SIZE x ADDRESS_SIZE x 1]

location_based_addressing(attention_BxAx1, shift_BxSx1, gamma_Bx1x1, prev_memory_BxAxC)[source]¶

Computes location-based addressing, i.e. shitfts the head and sharpens.

Parameters:	attention_BxAx1 – Current attention [BATCH_SIZE x ADDRESS_SIZE x 1] shift_BxSx1 – soft shift maks (convolutional kernel) [BATCH_SIZE x SHIFT_SIZE x 1] gamma_Bx1x1 – sharpening factor [BATCH_SIZE x 1 x 1] prev_memory_BxAxC – tensor containing memory before update [BATCH_SIZE x MEMORY_ADDRESSES x CONTENT_BITS]
Returns:	attention vector of size [BATCH_SIZE x ADDRESS_SIZE x 1]

circular_convolution(attention_BxAx1, shift_BxSx1, prev_memory_BxAxC)[source]¶

Performs circular convoution, i.e. shitfts the attention accodring to given shift vector (convolution mask).

Parameters:	attention_BxAx1 – Current attention [BATCH_SIZE x ADDRESS_SIZE x 1] shift_BxSx1 – soft shift maks (convolutional kernel) [BATCH_SIZE x SHIFT_SIZE x 1] prev_memory_BxAxC – tensor containing memory before update [BATCH_SIZE x MEMORY_ADDRESSES x CONTENT_BITS]
Returns:	attention vector of size [BATCH_SIZE x ADDRESS_SIZE x 1]

sharpening(attention_BxAx1, gamma_Bx1x1)[source]¶

Performs attention sharpening.

Parameters:	attention_BxAx1 – Current attention [BATCH_SIZE x ADDRESS_SIZE x 1] gamma_Bx1x1 – sharpening factor [BATCH_SIZE x 1 x 1]
Returns:	attention vector of size [BATCH_SIZE x ADDRESS_SIZE x 1]

update_memory(write_attention_BxAx1, erase_vector_Bx1xC, add_vector_Bx1xC, prev_memory_BxAxC)[source]¶

Returns 3D tensor of size [BATCH_SIZE x MEMORY_ADDRESSES x CONTENT_BITS] storing new content of the memory.

Parameters:	write_attention_BxAx1 – Current write attention [BATCH_SIZE x ADDRESS_SIZE x 1] erase_vector_Bx1xC – Erase vector [BATCH_SIZE x 1 x CONTENT_BITS] add_vector_Bx1xC – Add vector [BATCH_SIZE x 1 x CONTENT_BITS] prev_memory_BxAxC – tensor containing previous state of the memory [BATCH_SIZE x MEMORY_ADDRESSES x CONTENT_BITS]
Returns:	vector read from the memory [BATCH_SIZE x CONTENT_BITS]

class miprometheus.models.encoder_solver.MAES(params, problem_default_values_={})[source]¶

Class implementing the Memory Augmented Encoder-Solver (MAES) model.

..warning:: Class assumes, that the whole batch has the same length, i.e. batch of subsequences becoming input to encoder is of the same length (ends at the same item). The same goes to subsequences being input to decoder.

__init__(params, problem_default_values_={})[source]¶

Constructor. Initializes parameters on the basis of dictionary passed as argument.

Parameters:	params – Local view to the Parameter Regsitry ‘’model’’ section. problem_default_values – Dictionary containing key-values received from problem.

save(model_dir, training_status, training_stats, validation_stats)[source]¶

Generic method saving the model parameters to file. It can be overloaded if one needs more control.

Parameters:

model_dir (str) – Directory where the model will be saved.
training_status (str) – String representing the current status of training.
training_stats (:py:class:miprometheus.utils.StatisticsCollector or :py:class:miprometheus.utils.StatisticsAggregator) – Training statistics that will be saved to checkpoint along with the model.
validation_stats (:py:class:miprometheus.utils.StatisticsCollector or :py:class:miprometheus.utils.StatisticsAggregator) – Validation statistics that will be saved to checkpoint along with the model.

Returns:

True if this is currently the best model (until the current episode, considering the loss).

forward(data_dict)[source]¶

Forward function requires that the data_dict will contain at least “sequences”

Parameters:	data_dict – DataDict containing at least: - “sequences”: a tensor of input data of size [BATCH_SIZE x LENGTH_SIZE x INPUT_SIZE]
Returns:	Predictions (logits) being a tensor of size [BATCH_SIZE x LENGTH_SIZE x OUTPUT_SIZE].

class miprometheus.models.encoder_solver.MASCellStateTuple[source]¶: Tuple used by MAS Cells for storing current/past state information.

class miprometheus.models.encoder_solver.MASCell(params)[source]¶

Class representing a single Memory-Augmented Decoder cell.

__init__(params)[source]¶

Cell constructor. Cell creates controller and interface. Assumes that memory will be initialized by the encoder.

Parameters:	params – Dictionary of parameters.

init_state(final_enc_memory_BxAxC, final_enc_attention_BxAx1)[source]¶

Initializes the solver cell state depending on the last memory state. Recursively initialization: controller, interface.

Parameters:	encoder_state – Last state of MAE cell.
Returns:	Initial state tuple - object of MASCellStateTuple class.

init_state_with_encoder_state(final_enc_cell_state)[source]¶

Creates ‘zero’ (initial) state on the basis of he previous cell state. “Recursivelly” calls controller and interface initialization.

Parameters:	final_enc_cell_state – Last state of MAE cell.
Returns:	Initial state tuple - object of MASCellStateTuple class.

forward(inputs_BxI, prev_cell_state)[source]¶

Forward function of MAS cell.

Parameters:	inputs_BxI – a Tensor of input data of size [BATCH_SIZE x INPUT_SIZE] prev_cell_state – a MASCellStateTuple tuple, containing previous state of the cell.
Returns:	an output Tensor of size [BATCH_SIZE x OUTPUT_SIZE] and MASCellStateTuple tuple containing current cell state.

class miprometheus.models.encoder_solver.MASInterfaceStateTuple[source]¶: Tuple used by interface for storing current/past Memory Augmented Solver interface state information.

class miprometheus.models.encoder_solver.MASInterface(params)[source]¶

Class realizing interface between MAS controller and memory.

__init__(params)[source]¶

Constructor.

Parameters:	params – Dictionary of parameters.

init_state(batch_size, num_memory_addresses, final_encoder_attention_BxAx1)[source]¶

Returns ‘zero’ (initial) state tuple.

Parameters:	batch_size – Size of the batch in given iteraction/epoch. num_memory_addresses – Number of memory addresses. final_encoder_attention_BxAx1 – final attention of the encoder [BATCH_SIZE x MEMORY_ADDRESSES x 1]
Returns:	Initial state tuple - object of InterfaceStateTuple class.

forward(ctrl_hidden_state_BxH, prev_memory_BxAxC, prev_interface_state_tuple)[source]¶

Controller forward function.

Parameters:	ctrl_hidden_state_BxH – a Tensor with controller hidden state of size [BATCH_SIZE x HIDDEN_SIZE] prev_memory_BxAxC – Previous state of the memory [BATCH_SIZE x MEMORY_ADDRESSES x CONTENT_BITS] prev_interface_state_tuple – Tuple containing previous read and write attention vectors.
Returns:	List of read vectors [BATCH_SIZE x CONTENT_SIZE], updated memory and state tuple (object of LSTMStateTuple class).

calculate_param_locations(param_sizes_dict, head_name)[source]¶

Calculates locations of parameters, that will subsequently be used during parameter splitting.

Parameters:	param_sizes_dict – Dictionary containing parameters along with their sizes (in bits/units). head_name – Name of head.
Returns:	“Locations” of parameters.

split_params(params, locations)[source]¶: Split parameters into list on the basis of locations.

update_attention(gate_Bx3, shift_BxS, gamma_Bx1, prev_memory_BxAxC, prev_attention_BxAx1)[source]¶

Updates the attention weights.

Parameters:	gate_Bx3 – shift_BxS – gamma_Bx1 – prev_memory_BxAxC – tensor containing memory before update [BATCH_SIZE x MEMORY_ADDRESSES x CONTENT_BITS] prev_attention_BxAx1 – previous attention vector [BATCH_SIZE x MEMORY_ADDRESSES x 1]
Returns:	attention vector of size [BATCH_SIZE x ADDRESS_SIZE x 1]

location_based_addressing(attention_BxAx1, shift_BxSx1, gamma_Bx1x1, prev_memory_BxAxC)[source]¶

Computes location-based addressing, i.e. shitfts the head and sharpens.

Parameters:	attention_BxAx1 – Current attention [BATCH_SIZE x ADDRESS_SIZE x 1] shift_BxSx1 – soft shift maks (convolutional kernel) [BATCH_SIZE x SHIFT_SIZE x 1] gamma_Bx1x1 – sharpening factor [BATCH_SIZE x 1 x 1] prev_memory_BxAxC – tensor containing memory before update [BATCH_SIZE x MEMORY_ADDRESSES x CONTENT_BITS]
Returns:	attention vector of size [BATCH_SIZE x ADDRESS_SIZE x 1]

circular_convolution(attention_BxAx1, shift_BxSx1, prev_memory_BxAxC)[source]¶

Performs circular convoution, i.e. shitfts the attention accodring to given shift vector (convolution mask).

Parameters:	attention_BxAx1 – Current attention [BATCH_SIZE x ADDRESS_SIZE x 1] shift_BxSx1 – soft shift maks (convolutional kernel) [BATCH_SIZE x SHIFT_SIZE x 1] prev_memory_BxAxC – tensor containing memory before update [BATCH_SIZE x MEMORY_ADDRESSES x CONTENT_BITS]
Returns:	attention vector of size [BATCH_SIZE x ADDRESS_SIZE x 1]

sharpening(attention_BxAx1, gamma_Bx1x1)[source]¶

Performs attention sharpening.

Parameters:	attention_BxAx1 – Current attention [BATCH_SIZE x ADDRESS_SIZE x 1] gamma_Bx1x1 – sharpening factor [BATCH_SIZE x 1 x 1]
Returns:	attention vector of size [BATCH_SIZE x ADDRESS_SIZE x 1]

read_from_memory(attention_BxAx1, memory_BxAxC)[source]¶

Returns 2D tensor of size [BATCH_SIZE x CONTENT_BITS] storing vector read from memory given the attention.

Parameters:	attention_BxAx1 – Current attention [BATCH_SIZE x ADDRESS_SIZE x 1] memory_BxAxC – tensor containing memory [BATCH_SIZE x MEMORY_ADDRESSES x CONTENT_BITS]
Returns:	vector read from the memory [BATCH_SIZE x CONTENT_BITS]

Others Models¶

LSTM¶

class miprometheus.models.lstm.LSTM(params, problem_default_values_={})[source]¶

Class implementing the Long Short-Term Memory model.

__init__(params, problem_default_values_={})[source]¶

Constructor. Initializes parameters on the basis of dictionary passed as argument.

Parameters:	params – Local view to the Parameter Regsitry ‘’model’’ section. problem_default_values – Dictionary containing key-values received from problem.

forward(data_dict)[source]¶

Forward function requires that the data_dict will contain at least “sequences”

Parameters:	data_dict – DataDict containing at least: - “sequences”: a tensor of input data of size [BATCH_SIZE x LENGTH_SIZE x INPUT_SIZE]
Returns:	Predictions (logits) being a tensor of size [BATCH_SIZE x LENGTH_SIZE x OUTPUT_SIZE].

ThalNet¶

class miprometheus.models.thalnet.ThalNetCell(input_size: int, output_size: int, context_input_size: int, center_size_per_module: int, num_modules: int)[source]¶

Implementation of the ThalNetCell, iterating over one sequence element at a time.

It is constituted of several ThalNetModule.

__init__(input_size: int, output_size: int, context_input_size: int, center_size_per_module: int, num_modules: int)[source]¶

Constructor of the ThalNetCell class.

Parameters:	input_size (int) – size of the input sequences output_size (int) – size of the produced output sequences context_input_size (int) – context input size center_size_per_module (int) – Size of the center slot allocated to each module. num_modules (int) – number of modules to constitute the cell.

init_state(batch_size)[source]¶

Initialize the state of ThalNet.

Parameters:	batch_size (int) – batch size
Returns:	Initialized states of the ThalNet cell.

forward(inputs, prev_state)[source]¶

forward run of the ThalNetCell.

Parameters:

inputs (torch.tensor) – inputs at time t, [batch_size, input_size]
prev_state (torch.tensor) – previous state [batch_size, state_size]

Returns:

states [batch_size, state_size]
prediction [batch_size, output_size]

class miprometheus.models.thalnet.ThalNetModel(params, problem_default_values_={})[source]¶

ThalNet is a deep learning model inspired by neocortical communication via the thalamus. This model consists of recurrent neural modules that send features through a routing center, endowing the modules with the flexibility to share features over multiple time steps.

See the reference paper here: https://arxiv.org/pdf/1706.05744.pdf.

__init__(params, problem_default_values_={})[source]¶

Constructor of the ThalNetModel. Instantiates the ThalNetCell.

Parameters:	params – dictionary of parameters (read from the `.yaml` configuration file.) problem_default_values (dict) – default values coming from the `Problem` class.

forward(data_dict)[source]¶

Forward run of the ThalNetModel model.

Parameters:	data_dict (utils.DataDict) – DataDict({‘sequences’, …}) where ‘sequences’ is of shape [batch_size, sequence_length, input_size]
Returns:	Predictions [batch_size, sequence_length, output_size]

generate_figure_layout()[source]¶

Generate a figure layout which will be used in self.plot().

Returns:	figure layout.

plot(data_dict, logits, sample=0)[source]¶

Plots specific information on the model’s behavior.

Parameters:	data_dict (utils.DataDict) – DataDict({‘sequences’, …}) logits (torch.tensor) – Predictions of the model sample (int) – Index of the sample to visualize. Default to 0.
Returns:	`True` if the user pressed stop, else `False`.

class miprometheus.models.thalnet.ThalnetModule(center_size, context_size, center_size_per_module, input_size, output_size)[source]¶

Implements a ThalNet module.

__init__(center_size, context_size, center_size_per_module, input_size, output_size)[source]¶

Constructor of the ThalnetModule.

Parameters:	input_size (int) – size of the input sequences output_size (int) – size of the produced output sequences center_size (int) – Size of the center of the model. center_size_per_module (int) – Size of the center slot allocated to each module.

init_state(batch_size)[source]¶

Initialize the state of a ThalNet module.

Parameters:	batch_size (int) – batch size
Returns:	center_state_per_module, tuple_controller_states

forward(inputs, prev_center_state, prev_tuple_controller_state)[source]¶

Forward pass of a ThalnetModule.

Parameters:	inputs (torch.tensor) – input sequences. prev_center_state (torch.tensor) – previous center state prev_tuple_controller_state (tuple) – previous tuple controller state
Returns:	output, center_feature_output, tuple_ctrl_state