miprometheus.utils¶

AppState¶

class miprometheus.utils.AppState[source]¶

Represents the application state. Knows if computations should be moved to GPU, if visualization should be activated etc.

__init__()[source]¶

Constructor:

Disable visualization by default,

Use non-cuda types by default.

set_dtype(flag)[source]¶

Sets a global floating point type to be used in the models.

Parameters:	flag (str) – Flag indicating a floating point type.

set_itype(flag)[source]¶

Sets a global integer type to be used in the models.

Parameters:	flag (str) – Flag indicating an integer type.

convert_non_cuda_types()[source]¶: Sets all tensor types to non-cuda data types.

convert_cuda_types()[source]¶: Sets all tensor types to cuda data types.

DataDict¶

class miprometheus.utils.DataDict(*args, **kwargs)[source]¶

Mapping: A container object that supports arbitrary key lookups and implements the methods __getitem__, __iter__ and __len__.
Mutable objects can change their value but keep their id() -> ease modifying existing keys’ value.

DataDict: Dict used for storing batches of data by problems.

This is the main object class used to share data between a problem class and a model class through a worker.

__init__(*args, **kwargs)[source]¶

DataDict constructor. Can be initialized in different ways:

>>> data_dict = DataDict()
>>> data_dict = DataDict({'inputs': torch.tensor(), 'targets': numpy.ndarray()})
>>> # etc.

Parameters:	args – Used to pass a non-keyworded, variable-length argument list. kwargs – Used to pass a keyworded, variable-length argument list.

__setitem__(key, value, addkey=False)[source]¶

key:value setter function.

Parameters:	key – Dict Key. value – Associated value. addkey (bool) – Indicate whether or not it is authorized to add a new key on-the-fly. Default: `False`.

Warning

addkey is set to False by default as setting it to True removes flexibility of the DataDict. Indeed, there are some cases where adding a key on-the-fly to a DataDict is useful (e.g. for plotting pre-processing).

__getitem__(key)[source]¶

Value getter function.

Parameters:	key – Dict Key.
Returns:	Associated Value.

__delitem__(key, override=False)[source]¶

Delete a key:value pair.

Warning

By default, it is not authorized to delete an existing key. Set override to True to ignore this restriction.

Parameters:	key – Dict Key. override (bool) – Indicate whether or not to lift the ban of non-deletion of any key.

__str__()[source]¶

Returns:	A simple Dict representation of `DataDict`.

__repr__()[source]¶

Returns:	Echoes class, id, & reproducible representation in the Read–Eval–Print Loop.

numpy()[source]¶

Converts the DataDict to numpy objects.

Note

The torch.tensor (s) contained in self are converted using torch.Tensor.numpy() : This tensor and the returned ndarray share the same underlying storage. Changes to self tensor will be reflected in the ndarray and vice versa.

If an element of self is not a torch.tensor, it is returned as is.

Returns:	Converted DataDict.

cpu()[source]¶

Moves the DataDict to memory accessible to the CPU.

Note

The torch.tensor (s) contained in self are converted using torch.Tensor.cpu() . If an element of self is not a torch.tensor, it is returned as is, i.e. We only move the torch.tensor (s) contained in self.

Returns:	Converted DataDict.

cuda(device=None, non_blocking=False)[source]¶

Returns a copy of this object in CUDA memory.

Note

Wraps call to torch.Tensor.cuda(): If this object is already in CUDA memory and on the correct device, then no copy is performed and the original object is returned. If an element of self is not a torch.tensor, it is returned as is, i.e. We only move the torch.tensor (s) contained in self.

Parameters:	device (torch.device) – The destination GPU device. Defaults to the current CUDA device. non_blocking (bool) – If True and the source is in pinned memory, the copy will be asynchronous with respect to the host. Otherwise, the argument has no effect. Default: `False`.

detach()[source]¶: Returns a new DataDict, detached from the current graph. The result will never require gradient.

Note

Wraps call to torch.Tensor.detach() : the torch.tensor (s) in the returned DataDict use the same data tensor(s) as the original one(s). In-place modifications on either of them will be seen, and may trigger errors in correctness checks.

ParamInterface¶

class miprometheus.utils.ParamInterface(*keys)[source]¶

Interface to the ParamRegistry singleton.

Inherits collections.Mapping, and therefore exposes functionality close to a dict.

Offers a read (through collections.Mapping interface) and write (through add_default_params() and add_config_params() methods) view of the ParamRegistry.

Warning

This class is the only interface to ParamRegistry, and thus the only way to interact with it.

__init__(*keys)[source]¶

Constructor:

Call base constructor (Mapping),

Initializes the ParamRegistry,

Initializes empty keys_path list

Parameters:	keys (sequence / collection: dict, list etc.) – Sequence of keys to the subtree of the registry. The subtree hierarchy will be created if it does not exist. If empty, shows the whole registry.

Note

Calling to_dict() after initializing a ParamInterface with keys, will throw a KeyError.

Adding default & config params should be done through add_default_param() and add_config_param().

keys is mainly purposed for the recursion of ParamInterface.

to_dict()[source]¶

Returns:	dict containing a snapshot of the current `ParamInterface` tree.

__getitem__(key)[source]¶

Get parameter value under key.

The parameter dict is derived from the default parameters updated with the config parameters.

Parameters:	key (str) – key to value in the `ParamInterface` tree.
Returns:	`ParamInterface` `[key]` or value if leaf of the `ParamRegistry` tree.

__len__()[source]¶

Returns:	Length of the `ParamInterface`.

__iter__()[source]¶

Returns:	Iterator over the `ParamInterface`.

leafs()[source]¶: Yields the leafs of the current ParamInterface.

set_leaf(leaf_key, leaf_value)[source]¶

Update the value of the specified leaf_key of the current ParamInterface with the specified leaf_value.

Parameters:	leaf_key (str) – leaf key to update. leaf_value – New value to set.
Returns:	`True` if the leaf value has been changed, `False` if `leaf_key` is not in `ParamInterface.leafs()`.

add_default_params(default_params: dict)[source]¶

Appends default_params to the config parameter dict of the ParamRegistry.

Note

This method should be used by the objects necessitating default values (problems, models, workers etc.).

Parameters:	default_params (dict) – Dictionary containing default values.

The dictionary will be inserted into the subtree keys path indicated at the initialization of the current ParamInterface.

add_config_params(config_params: dict)[source]¶

Appends config_params to the config parameter dict of the ParamRegistry.

Note

This is intended for the user to dynamically (re)configure his experiments.

Parameters:	config_params (dict) – Dictionary containing config values.

The dictionary will be inserted into the subtree keys path indicated at the initialization of the current ParamInterface.

del_default_params(key)[source]¶

Removes the entry from the default params living under key.

The entry can either be a subtree or a leaf of the default params tree.

Parameters:	key (str) – key to subtree / leaf in the default params tree.

del_config_params(key)[source]¶

Removes the entry from the config params living under key.

The entry can either be a subtree or a leaf of the config params tree.

Parameters:	key (str) – key to subtree / leaf in the config params tree.

add_config_params_from_yaml(yaml_path: str)[source]¶

Helper function adding config params by loading the file at yaml_path.

Wraps call to add_default_param().

Parameters:	yaml_path (str`) – Path to a `.yaml` file containing config parameters.

ParamRegistry¶

class miprometheus.utils.MetaSingletonABC[source]¶: Metaclass that inherits both SingletonMetaClass, and ABCMeta (collection.Mappings’ metaclass).

class miprometheus.utils.SingletonMetaClass[source]¶

class miprometheus.utils.ParamRegistry[source]¶

Registry singleton for the parameters.

Registers default values (coming from workers, models, problems, etc) as well as config values loaded by the user for a particular experiment.

Parameters can be read from the registry by indexing. The returned parameters are the default ones superseded by all the config ones.

The merging of default and config parameters is computed every time the registry is changed.

Can contain nested parameters sections (acts as a dict).

Warning

This class should not be used except through ParamInterface.

__init__()[source]¶

Constructor:

Call base constructor (Mapping),

Initializes empty parameters dicts for:

Default parameters,

Config parameters,

Resulting tree.

add_default_params(default_params: dict)[source]¶

Appends default_params to the default parameter dict of the current ParamRegistry, and update the resulting parameters dict.

Note

This method should be used by the objects necessitating default values (problems, models, workers etc.).

Parameters:	default_params (dict) – Dictionary containing default values.

add_config_params(config_params: dict)[source]¶

Appends config_params to the config parameter dict of the current ParamRegistry, and update the resulting parameters dict.

Note

This is intended for the user to dynamically (re)configure his experiments.

Parameters:	config_params (dict) – Dictionary containing config values.

del_default_params(keypath: list)[source]¶

Removes an entry from the default parameter dict of the current ParamRegistry, and update the resulting parameters dict.

The entry can either be a subtree or a leaf of the default parameter dict.

Parameters:	keypath (list) – list of keys to subtree / leaf in the default parameter dict.

del_config_params(keypath: list)[source]¶

Removes an entry from the config parameter dict of the current ParamRegistry, and update the resulting parameters dict.

The entry can either be a subtree or a leaf of the config parameter dict.

Parameters:	keypath (list) – list of keys to subtree / leaf in the config parameter dict.

__getitem__(key)[source]¶

Get parameter value under key.

The parameter dict is derived from the default parameters updated with the config parameters.

Parameters:	key (str) – key to value in the `ParamRegistry`.
Returns:	Parameter value

__iter__()[source]¶

Returns:	Iterator over the `ParamRegistry`.

__len__()[source]¶

Returns:	Length of the `ParamRegistry`.

update_dict_recursively(current_node, update_node)[source]¶

Recursively update the current_node of the ParamRegistry with the values of the update_node.

Starts from the root of the current_node.

Parameters:	current_node (`ParamRegistry` (inheriting from `Mapping`)) – Current (default or config) node. update_node (`ParamRegistry` (inheriting from `Mapping`)) – Values to be added/updated to the `current_node`.
Returns:	Updated current node.

static delete_subtree(current_dict, keypath: list)[source]¶

Delete the subtree indexed by the keypath from the current_dict.

Parameters:	current_dict (dict) – dictionary to act on. keypath (list) – list of keys to subtree in `current_dict` to delete

SamplerFactory¶

class miprometheus.utils.SamplerFactory[source]¶

Class returning sampler depending on the name provided in the list of parameters.

static build(problem, params)[source]¶

Static method returning particular sampler, depending on the name provided in the list of parameters & the specified problem class.

Parameters:	problem (`problems.Problem`) – Instance of an object derived from the Problem class. params (`utils.param_interface.ParamInterface`) – Parameters used to instantiate the sampler.

..note:

``params`` should contains the exact (case-sensitive) class name of the sampler to instantiate.

Warning

torch.utils.data.sampler.WeightedRandomSampler, torch.utils.data.sampler.BatchSampler, torch.utils.data.sampler.DistributedSampler are not yet supported.

Note

torch.utils.data.sampler.SubsetRandomSampler expects indices to index a subset of the dataset. Currently, the user can specify these indices using one of the following options:

Option 1: range.
>>> indices = range(20)
Option 2: range as str.
>>> range_str = '0, 20'

Option 3: list of indices.

>>> yaml_list = yaml.load('[0, 2, 5, 10]')

Option 4: name of the file containing indices.

>>> filename = "~/data/mnist/training_indices.txt"

Returns:	Instance of a given sampler or `None` if the section not present or couldn’t build the sampler.

Split Indices¶

split_indices.py:

Contains the definition of a split_indices function.

miprometheus.utils.split_indices.split_indices(length, split, logger, random_sampling=True)[source]¶

Splits the indices of an array of a given length into two parts, using the split as the divider.

Random sampling is used by default, but can be turned off.

Parameters:	length (int) – Length (size) of the dataset. split (int) – Determines how many indices will belong to subset a and subset b. logger (logging.Logger) – Logging utility. random_sampling (bool) – Use random sampling (DEFAULT: `True`). If set to `False`, will return two ranges instead of lists with indices.
Returns:	Two lists with indices (when random_sampling is `True`), or two lists with two elements - ranges (when `False`).

StatisticsCollector¶

class miprometheus.utils.StatisticsCollector[source]¶

Specialized class used for the collection and export of statistics during training, validation and testing.

Inherits collections.Mapping, therefore it offers functionality close to a dict.

__init__()[source]¶: Initialization - creates dictionaries for statistics and formatting.

add_statistic(key, formatting)[source]¶

Add a statistic to collector. The value of associated to the key is of type list.

Parameters:	key (str) – Key of the statistic. formatting – Formatting that will be used when logging and exporting to CSV.

__getitem__(key)[source]¶

Get statistics value for given key.

Parameters:	key (str) – Key to value in parameters.
Returns:	Statistics value list associated with given key.

__setitem__(key, value)[source]¶

Add value to the list of the statistic associated with a given key.

Parameters:	key – Key to value in parameters. value – Statistics value to append to the list associated with given key.

__delitem__(key)[source]¶

Delete the specified key.

Parameters:	key – Key to be deleted.

__len__()[source]¶: Returns “length” of self.statistics (i.e. number of tracked values).

__iter__()[source]¶: Iterator.

empty()[source]¶: Empty the list associated to the keys of the current statistics collector.

initialize_csv_file(log_dir, filename)[source]¶

Method creates new csv file and initializes it with a header produced on the base of statistics names.

Parameters:	log_dir (str) – Path to file. filename (str) – Filename to be created.
Returns:	File stream opened for writing.

export_to_csv(csv_file=None)[source]¶

Method writes current statistics to csv using the possessed formatting.

Parameters:	csv_file – File stream opened for writing, optional

export_to_checkpoint()[source]¶: This method exports the collected data into a dictionary using the associated formatting.

export_to_string(additional_tag='')[source]¶

Method returns current statistics in the form of string using the possessed formatting.

Parameters:	additional_tag (str) – An additional tag to append at the end of the created string.
Returns:	String being the concatenation of the statistics names & values.

initialize_tensorboard(tb_writer)[source]¶: Memorizes the writer that will be used with this collector.

export_to_tensorboard(tb_writer=None)[source]¶

Method exports current statistics to tensorboard.

Parameters:	tb_writer (`tensorboardX.SummaryWriter`) – TensorBoard writer, optional.

StatisticsAggregator¶

class miprometheus.utils.StatisticsAggregator[source]¶

Specialized class used for the computation of several statistical aggregators.

Inherits from miprometheus.utils.StatisticsCollector as it extends its capabilities: it relies on miprometheus.utils.StatisticsCollector to collect the statistics over an epoch (training set) or a validation (over the validation set).

Once the statistics have been collected, the miprometheus.utils.StatisticsAggregator allows to compute several statistical aggregators to summarize the last epoch or validation phase.

E.g. With the list of loss values from the last epoch, we can compute the average loss, the min & max, and the standard deviation.

__init__()[source]¶

Constructor for the miprometheus.utils.StatisticsAggregator. Defines empty aggregators dict.

Other statistical aggregators can be added via StatisticsAggregator.add_aggregator().

add_aggregator(key, formatting)[source]¶

Add a statistical aggregator. The value associated to the specified key is initiated as -1.

Parameters:

key (str) – Statistical aggregator to add. Such aggregator (e.g. min, max, mean, std…) should be based on an existing statistics collected by the miprometheus.utils.StatisticsCollector (e.g. added by StatisticsCollector.add_statistic() and collected by miprometheus.models.Model.collect_statistics() or miprometheus.models.Problem.collect_statistics().
formatting (str) – Formatting that will be used when logging and exporting to CSV.

__getitem__(key)[source]¶

Get the values list of the specified statistical aggregator.

Parameters:	key (str) – Name of the statistical aggregator to get the values list of.
Returns:	Values list associated with the specified statistical aggregator.

__setitem__(key, value)[source]¶

Set the value of the specified statistical aggregator, thus overwriting the existing one.

Parameters:	key (str) – Name of the statistical aggregator to set the value of. value (int, float) – Value to set for the given key.

__delitem__(key)[source]¶

Delete the specified statistical aggregator.

Parameters:	key (str) – Key to be deleted.

__len__()[source]¶: Returns the number of tracked statistical aggregators.

__iter__()[source]¶: Return an iterator on the currently tracked statistical aggregators.

initialize_csv_file(log_dir, filename)[source]¶

This method creates a new csv file and initializes it with a header produced on the base of the statistical aggregators names.

Parameters:	log_dir (str) – Path to file. filename (str) – Filename to be created.
Returns:	File stream opened for writing.

export_to_csv(csv_file=None)[source]¶

This method writes the current statistical aggregators values to the csv_file using the associated formatting.

Parameters:	csv_file – File stream opened for writing, optional.

export_to_checkpoint()[source]¶: This method exports the aggregated data into a dictionary using the associated formatting.

export_to_string(additional_tag='')[source]¶

This method returns the current statistical aggregators values in the form of a string using the associated formatting.

Parameters:	additional_tag (str) – An additional tag to append at the end of the created string.
Returns:	String being the concatenation of the statistical aggregators names & values.

export_to_tensorboard(tb_writer=None)[source]¶

Method exports current statistical aggregators values to TensorBoard.

Parameters:	tb_writer (`tensorboardX.SummaryWriter`) – TensorBoard writer, optional

TimePlot¶

miprometheus.utils.TimePlot¶

Losses¶

Masked BCEWithLogitsLoss¶

class miprometheus.utils.MaskedBCEWithLogitsLoss(weight=None)[source]¶

Calculates the binary cross entropy for batches with different numbers of outputs for the samples.

__init__(weight=None)[source]¶

Constructor for the MaskedBCEWithLogitsLoss.

Defines the inner loss as BCEWithLogitsLoss.

Parameters:	weight (Tensor, optional) – a manual rescaling weight given to each class. If given, has to be a Tensor of size C

forward(logits, targets, mask)[source]¶

Calculates loss accounting for different numbers of output per sample.

Parameters:	logits (torch.tensor.) – Logits being output by the model. [batch, classes, sequence]. targets (torch.LongTensor) – Targets [batch, sequence]. mask (torch.ByteTensor) – Mask [batch, sequence].
Returns:	loss value.

masked_accuracy(logits, targets, mask)[source]¶

Calculates accuracy equal to mean number of correct predictions in a given batch.

Warning

Applies mask to both logits and targets.

Parameters:	logits (torch.tensor.) – Logits being output by the model. [batch, classes, sequence]. targets (torch.LongTensor) – Targets [batch, sequence]. mask (torch.ByteTensor) – Mask [batch, sequence].
Returns:	accuracy value.

Masked CrossEntropyLoss¶

class miprometheus.utils.MaskedCrossEntropyLoss(weight=None, ignore_index=-100)[source]¶

Calculates the cross entropy for batches with different numbers of outputs per samples.

__init__(weight=None, ignore_index=-100)[source]¶

Constructor for the MaskedCrossEntropyLoss.

Defines the inner loss as CrossEntropyLoss.

Parameters:	weight (Tensor, optional) – a manual rescaling weight given to each class. If given, has to be a Tensor of size C ignore_index (int, optional) – Specifies a target value that is ignored and does not contribute to the input gradient.

forward(logits, targets, mask)[source]¶

Calculates loss accounting for different numbers of output per sample.

Parameters:	logits (torch.tensor.) – Logits being output by the model. [batch, classes, sequence]. targets (torch.LongTensor) – Targets [batch, sequence]. mask (torch.ByteTensor) – Mask [batch, sequence].
Returns:	loss value.

masked_accuracy(logits, targets, mask)[source]¶

Calculates accuracy equal to mean number of correct predictions in a given batch.

Parameters:	logits (torch.tensor.) – Logits being output by the model. [batch, classes, sequence]. targets (torch.LongTensor) – Targets [batch, sequence]. mask (torch.ByteTensor) – Mask [batch, sequence].
Returns:	accuracy value.

Problems Utils¶

GenerateFeatureMaps¶

class miprometheus.utils.GenerateFeatureMaps(image_dir, cnn_model, num_blocks, filename_template, set='train', transform=<MagicMock name='mock.ToTensor' id='139963945493728'>)[source]¶

Class handling the generation of feature using a pretrained CNN for specified images.

__init__(image_dir, cnn_model, num_blocks, filename_template, set='train', transform=<MagicMock name='mock.ToTensor' id='139963945493728'>)[source]¶

Creates the pretrained CNN model & move it to CUDA if available.

Parameters:

image_dir (str) – Directory path to the images to extract features from.
cnn_model (str) – Name of the pretrained CNN model to use. Must be in torchvision.models.
num_blocks – number of layers to use from the cnn_model. This is dependent on the specified cnn_model, please check this value beforehand.
filename_template –
The template followed by the filenames in image_dir. It should indicate with brackets where the index is located, e.g.
```
>>> filename_template = 'CLEVR_train_{}.png'
```

The index will be filled up on 6 characters.

Parameters:	set (str, optional.) – The dataset split to use. e.g. `train`, `val` etc. transform (transforms, optional.) – `torchvision.transform` to apply on the images before passing them to the CNN model. default: >>> transform = transforms.ToTensor

__getitem__(index)[source]¶

Gets a image from the image_dir and apply a transform on it if specified.

Parameters:	index – index of the sample to get.
Returns:	transformed image as a tensor (shape should be [224, 224, 3])

__len__()[source]¶

Returns:	length of dataset.

Language¶

class miprometheus.utils.Language(name)[source]¶

Class that loads pretrained embeddings from Torchtext.

__init__(name)[source]¶

Constructor.

Parameters:	name – string to name the language (at the moment it doesn’t do anything)

embed_sentence(sentence)[source]¶

Embed an entire sentence using a pretrained embedding.

Parameters:	sentence – A string containing the words to embed
Returns:	FloatTensor of embedded vectors [max_sentence_length, embedding size]

embed_word(word)[source]¶

Embed a single word.

Parameters:	sentence – A string containing a single word to embed
Returns:	FloatTensor with an single embedded vector in it [embedding size]

return_index_from_word(word)[source]¶

returns the index of a word in the vocab.

Parameters:	word – String of word in dictionary

return_word_from_index(index)[source]¶

Returns a word in the vocab from its index.

Parameters:	index – integer index of the word in the dictionary

build_pretrained_vocab(data_set, **kwargs)[source]¶

Construct the torchtext Vocab object from a list of sentences. This allows us to load only vectors we actually need.

Parameters:	data_set – A list containing strings (either sentences or just single word string work) **kwargs – The keyword arguments for the vectors class from torch text. The most important kwarg is vectors which is a string containing the embedding type to be loaded