miprometheus.utils¶
AppState¶
-
class
miprometheus.utils.
AppState
[source]¶ Represents the application state. Knows if computations should be moved to GPU, if visualization should be activated etc.
-
set_dtype
(flag)[source]¶ Sets a global floating point type to be used in the models.
Parameters: flag (str) – Flag indicating a floating point type.
-
DataDict¶
-
class
miprometheus.utils.
DataDict
(*args, **kwargs)[source]¶ - Mapping: A container object that supports arbitrary key lookups and implements the methods
__getitem__
,__iter__
and__len__
. - Mutable objects can change their value but keep their id() -> ease modifying existing keys’ value.
DataDict: Dict used for storing batches of data by problems.
This is the main object class used to share data between a problem class and a model class through a worker.
-
__init__
(*args, **kwargs)[source]¶ DataDict constructor. Can be initialized in different ways:
>>> data_dict = DataDict() >>> data_dict = DataDict({'inputs': torch.tensor(), 'targets': numpy.ndarray()}) >>> # etc.
Parameters: - args – Used to pass a non-keyworded, variable-length argument list.
- kwargs – Used to pass a keyworded, variable-length argument list.
-
__setitem__
(key, value, addkey=False)[source]¶ key:value setter function.
Parameters: - key – Dict Key.
- value – Associated value.
- addkey (bool) – Indicate whether or not it is authorized to add a new key on-the-fly. Default:
False
.
Warning
addkey is set to
False
by default as setting it toTrue
removes flexibility of theDataDict
. Indeed, there are some cases where adding a key on-the-fly to aDataDict
is useful (e.g. for plotting pre-processing).
-
__getitem__
(key)[source]¶ Value getter function.
Parameters: key – Dict Key. Returns: Associated Value.
-
__delitem__
(key, override=False)[source]¶ Delete a key:value pair.
Warning
By default, it is not authorized to delete an existing key. Set override to
True
to ignore this restriction.Parameters: - key – Dict Key.
- override (bool) – Indicate whether or not to lift the ban of non-deletion of any key.
-
__repr__
()[source]¶ Returns: Echoes class, id, & reproducible representation in the Read–Eval–Print Loop.
-
numpy
()[source]¶ Converts the DataDict to numpy objects.
Note
The
torch.tensor
(s) contained in self are converted usingtorch.Tensor.numpy()
: This tensor and the returned ndarray share the same underlying storage. Changes toself
tensor will be reflected in the ndarray and vice versa.If an element of
self
is not atorch.tensor
, it is returned as is.Returns: Converted DataDict.
-
cpu
()[source]¶ Moves the DataDict to memory accessible to the CPU.
Note
The
torch.tensor
(s) contained in self are converted usingtorch.Tensor.cpu()
. If an element of self is not atorch.tensor
, it is returned as is, i.e. We only move thetorch.tensor
(s) contained in self.Returns: Converted DataDict.
-
cuda
(device=None, non_blocking=False)[source]¶ Returns a copy of this object in CUDA memory.
Note
Wraps call to
torch.Tensor.cuda()
: If this object is already in CUDA memory and on the correct device, then no copy is performed and the original object is returned. If an element of self is not atorch.tensor
, it is returned as is, i.e. We only move thetorch.tensor
(s) contained in self.Parameters: - device (torch.device) – The destination GPU device. Defaults to the current CUDA device.
- non_blocking (bool) – If True and the source is in pinned memory, the copy will be asynchronous with respect to the host. Otherwise, the argument has no effect. Default:
False
.
-
detach
()[source]¶ Returns a new DataDict, detached from the current graph. The result will never require gradient.
Note
Wraps call to
torch.Tensor.detach()
: thetorch.tensor
(s) in the returnedDataDict
use the same data tensor(s) as the original one(s). In-place modifications on either of them will be seen, and may trigger errors in correctness checks.
- Mapping: A container object that supports arbitrary key lookups and implements the methods
ParamInterface¶
-
class
miprometheus.utils.
ParamInterface
(*keys)[source]¶ Interface to the
ParamRegistry
singleton.Inherits
collections.Mapping
, and therefore exposes functionality close to a dict.Offers a read (through
collections.Mapping
interface) and write (throughadd_default_params()
andadd_config_params()
methods) view of theParamRegistry
.Warning
This class is the only interface to
ParamRegistry
, and thus the only way to interact with it.-
__init__
(*keys)[source]¶ Constructor:
- Call base constructor (
Mapping
), - Initializes the
ParamRegistry
, - Initializes empty keys_path list
Parameters: keys (sequence / collection: dict, list etc.) – Sequence of keys to the subtree of the registry. The subtree hierarchy will be created if it does not exist. If empty, shows the whole registry. Note
Calling
to_dict()
after initializing aParamInterface
withkeys
, will throw aKeyError
.Adding default & config params should be done through
add_default_param()
andadd_config_param()
.keys
is mainly purposed for the recursion ofParamInterface
.- Call base constructor (
-
to_dict
()[source]¶ Returns: dict containing a snapshot of the current ParamInterface
tree.
-
__getitem__
(key)[source]¶ Get parameter value under
key
.The parameter dict is derived from the default parameters updated with the config parameters.
Parameters: key (str) – key to value in the ParamInterface
tree.Returns: ParamInterface
[key]
or value if leaf of theParamRegistry
tree.
-
__len__
()[source]¶ Returns: Length of the ParamInterface
.
-
__iter__
()[source]¶ Returns: Iterator over the ParamInterface
.
-
leafs
()[source]¶ Yields the leafs of the current
ParamInterface
.
-
set_leaf
(leaf_key, leaf_value)[source]¶ Update the value of the specified
leaf_key
of the currentParamInterface
with the specifiedleaf_value
.Parameters: - leaf_key (str) – leaf key to update.
- leaf_value – New value to set.
Returns: True
if the leaf value has been changed,False
ifleaf_key
is not inParamInterface.leafs()
.
-
add_default_params
(default_params: dict)[source]¶ Appends
default_params
to the config parameter dict of theParamRegistry
.Note
This method should be used by the objects necessitating default values (problems, models, workers etc.).
Parameters: default_params (dict) – Dictionary containing default values. The dictionary will be inserted into the subtree keys path indicated at the initialization of the current
ParamInterface
.
-
add_config_params
(config_params: dict)[source]¶ Appends
config_params
to the config parameter dict of theParamRegistry
.Note
This is intended for the user to dynamically (re)configure his experiments.
Parameters: config_params (dict) – Dictionary containing config values. The dictionary will be inserted into the subtree keys path indicated at the initialization of the current
ParamInterface
.
-
del_default_params
(key)[source]¶ Removes the entry from the default params living under
key
.The entry can either be a subtree or a leaf of the default params tree.
Parameters: key (str) – key to subtree / leaf in the default params tree.
-
ParamRegistry¶
-
class
miprometheus.utils.
MetaSingletonABC
[source]¶ Metaclass that inherits both SingletonMetaClass, and ABCMeta (collection.Mappings’ metaclass).
-
class
miprometheus.utils.
ParamRegistry
[source]¶ Registry singleton for the parameters.
Registers default values (coming from workers, models, problems, etc) as well as config values loaded by the user for a particular experiment.
Parameters can be read from the registry by indexing. The returned parameters are the default ones superseded by all the config ones.
The merging of default and config parameters is computed every time the registry is changed.
Can contain nested parameters sections (acts as a dict).
Warning
This class should not be used except through
ParamInterface
.-
__init__
()[source]¶ Constructor:
Call base constructor (
Mapping
),Initializes empty parameters dicts for:
- Default parameters,
- Config parameters,
- Resulting tree.
-
add_default_params
(default_params: dict)[source]¶ Appends
default_params
to the default parameter dict of the currentParamRegistry
, and update the resulting parameters dict.Note
This method should be used by the objects necessitating default values (problems, models, workers etc.).
Parameters: default_params (dict) – Dictionary containing default values.
-
add_config_params
(config_params: dict)[source]¶ Appends
config_params
to the config parameter dict of the currentParamRegistry
, and update the resulting parameters dict.Note
This is intended for the user to dynamically (re)configure his experiments.
Parameters: config_params (dict) – Dictionary containing config values.
-
del_default_params
(keypath: list)[source]¶ Removes an entry from the default parameter dict of the current
ParamRegistry
, and update the resulting parameters dict.The entry can either be a subtree or a leaf of the default parameter dict.
Parameters: keypath (list) – list of keys to subtree / leaf in the default parameter dict.
-
del_config_params
(keypath: list)[source]¶ Removes an entry from the config parameter dict of the current
ParamRegistry
, and update the resulting parameters dict.The entry can either be a subtree or a leaf of the config parameter dict.
Parameters: keypath (list) – list of keys to subtree / leaf in the config parameter dict.
-
__getitem__
(key)[source]¶ Get parameter value under
key
.The parameter dict is derived from the default parameters updated with the config parameters.
Parameters: key (str) – key to value in the ParamRegistry
.Returns: Parameter value
-
__iter__
()[source]¶ Returns: Iterator over the ParamRegistry
.
-
__len__
()[source]¶ Returns: Length of the ParamRegistry
.
-
update_dict_recursively
(current_node, update_node)[source]¶ Recursively update the
current_node
of theParamRegistry
with the values of theupdate_node
.Starts from the root of the
current_node
.Parameters: - current_node (
ParamRegistry
(inheriting fromMapping
)) – Current (default or config) node. - update_node (
ParamRegistry
(inheriting fromMapping
)) – Values to be added/updated to thecurrent_node
.
Returns: Updated current node.
- current_node (
-
SamplerFactory¶
-
class
miprometheus.utils.
SamplerFactory
[source]¶ Class returning sampler depending on the name provided in the list of parameters.
-
static
build
(problem, params)[source]¶ Static method returning particular sampler, depending on the name provided in the list of parameters & the specified problem class.
Parameters: - problem (
problems.Problem
) – Instance of an object derived from the Problem class. - params (
utils.param_interface.ParamInterface
) – Parameters used to instantiate the sampler.
..note:
``params`` should contains the exact (case-sensitive) class name of the sampler to instantiate.
Warning
torch.utils.data.sampler.WeightedRandomSampler
,torch.utils.data.sampler.BatchSampler
,torch.utils.data.sampler.DistributedSampler
are not yet supported.Note
torch.utils.data.sampler.SubsetRandomSampler
expects indices to index a subset of the dataset. Currently, the user can specify these indices using one of the following options:- Option 1: range.
>>> indices = range(20)
- Option 2: range as str.
>>> range_str = '0, 20'
- Option 3: list of indices.
>>> yaml_list = yaml.load('[0, 2, 5, 10]')
- Option 4: name of the file containing indices.
>>> filename = "~/data/mnist/training_indices.txt"
Returns: Instance of a given sampler or None
if the section not present or couldn’t build the sampler.- problem (
-
static
Split Indices¶
split_indices.py:
- Contains the definition of a split_indices function.
-
miprometheus.utils.split_indices.
split_indices
(length, split, logger, random_sampling=True)[source]¶ Splits the indices of an array of a given
length
into two parts, using thesplit
as the divider.Random sampling is used by default, but can be turned off.
Parameters: - length (int) – Length (size) of the dataset.
- split (int) – Determines how many indices will belong to subset a and subset b.
- logger (logging.Logger) – Logging utility.
- random_sampling (bool) – Use random sampling (DEFAULT:
True
). If set toFalse
, will return two ranges instead of lists with indices.
Returns: Two lists with indices (when random_sampling is
True
), or two lists with two elements - ranges (whenFalse
).
StatisticsCollector¶
-
class
miprometheus.utils.
StatisticsCollector
[source]¶ Specialized class used for the collection and export of statistics during training, validation and testing.
Inherits
collections.Mapping
, therefore it offers functionality close to adict
.-
add_statistic
(key, formatting)[source]¶ Add a statistic to collector. The value of associated to the key is of type
list
.Parameters: - key (str) – Key of the statistic.
- formatting – Formatting that will be used when logging and exporting to CSV.
-
__getitem__
(key)[source]¶ Get statistics value for given key.
Parameters: key (str) – Key to value in parameters. Returns: Statistics value list associated with given key.
-
__setitem__
(key, value)[source]¶ Add value to the list of the statistic associated with a given key.
Parameters: - key – Key to value in parameters.
- value – Statistics value to append to the list associated with given key.
-
initialize_csv_file
(log_dir, filename)[source]¶ Method creates new csv file and initializes it with a header produced on the base of statistics names.
Parameters: Returns: File stream opened for writing.
-
export_to_csv
(csv_file=None)[source]¶ Method writes current statistics to csv using the possessed formatting.
Parameters: csv_file – File stream opened for writing, optional
-
export_to_checkpoint
()[source]¶ This method exports the collected data into a dictionary using the associated formatting.
-
export_to_string
(additional_tag='')[source]¶ Method returns current statistics in the form of string using the possessed formatting.
Parameters: additional_tag (str) – An additional tag to append at the end of the created string. Returns: String being the concatenation of the statistics names & values.
-
StatisticsAggregator¶
-
class
miprometheus.utils.
StatisticsAggregator
[source]¶ Specialized class used for the computation of several statistical aggregators.
Inherits from
miprometheus.utils.StatisticsCollector
as it extends its capabilities: it relies onmiprometheus.utils.StatisticsCollector
to collect the statistics over an epoch (training set) or a validation (over the validation set).Once the statistics have been collected, the
miprometheus.utils.StatisticsAggregator
allows to compute several statistical aggregators to summarize the last epoch or validation phase.E.g. With the list of loss values from the last epoch, we can compute the average loss, the min & max, and the standard deviation.
-
__init__
()[source]¶ Constructor for the
miprometheus.utils.StatisticsAggregator
. Defines empty aggregators dict.Other statistical aggregators can be added via
StatisticsAggregator.add_aggregator()
.
-
add_aggregator
(key, formatting)[source]¶ Add a statistical aggregator. The value associated to the specified key is initiated as -1.
Parameters: - key (str) – Statistical aggregator to add. Such aggregator (e.g. min, max, mean, std…) should be based on an existing statistics collected by the
miprometheus.utils.StatisticsCollector
(e.g. added byStatisticsCollector.add_statistic()
and collected bymiprometheus.models.Model.collect_statistics()
ormiprometheus.models.Problem.collect_statistics()
. - formatting (str) – Formatting that will be used when logging and exporting to CSV.
- key (str) – Statistical aggregator to add. Such aggregator (e.g. min, max, mean, std…) should be based on an existing statistics collected by the
-
__getitem__
(key)[source]¶ Get the values list of the specified statistical aggregator.
Parameters: key (str) – Name of the statistical aggregator to get the values list of. Returns: Values list associated with the specified statistical aggregator.
-
__setitem__
(key, value)[source]¶ Set the value of the specified statistical aggregator, thus overwriting the existing one.
Parameters:
-
__delitem__
(key)[source]¶ Delete the specified statistical aggregator.
Parameters: key (str) – Key to be deleted.
-
initialize_csv_file
(log_dir, filename)[source]¶ This method creates a new csv file and initializes it with a header produced on the base of the statistical aggregators names.
Parameters: Returns: File stream opened for writing.
-
export_to_csv
(csv_file=None)[source]¶ This method writes the current statistical aggregators values to the csv_file using the associated formatting.
Parameters: csv_file – File stream opened for writing, optional.
-
export_to_checkpoint
()[source]¶ This method exports the aggregated data into a dictionary using the associated formatting.
-
export_to_string
(additional_tag='')[source]¶ This method returns the current statistical aggregators values in the form of a string using the associated formatting.
Parameters: additional_tag (str) – An additional tag to append at the end of the created string. Returns: String being the concatenation of the statistical aggregators names & values.
-
Losses¶
Masked BCEWithLogitsLoss¶
-
class
miprometheus.utils.
MaskedBCEWithLogitsLoss
(weight=None)[source]¶ Calculates the binary cross entropy for batches with different numbers of outputs for the samples.
-
__init__
(weight=None)[source]¶ Constructor for the
MaskedBCEWithLogitsLoss
.Defines the inner loss as
BCEWithLogitsLoss
.Parameters: weight (Tensor, optional) – a manual rescaling weight given to each class. If given, has to be a Tensor of size C
-
forward
(logits, targets, mask)[source]¶ Calculates loss accounting for different numbers of output per sample.
Parameters: - logits (torch.tensor.) – Logits being output by the model. [batch, classes, sequence].
- targets (torch.LongTensor) – Targets [batch, sequence].
- mask (torch.ByteTensor) – Mask [batch, sequence].
Returns: loss value.
-
masked_accuracy
(logits, targets, mask)[source]¶ Calculates accuracy equal to mean number of correct predictions in a given batch.
Warning
Applies
mask
to bothlogits
andtargets
.Parameters: - logits (torch.tensor.) – Logits being output by the model. [batch, classes, sequence].
- targets (torch.LongTensor) – Targets [batch, sequence].
- mask (torch.ByteTensor) – Mask [batch, sequence].
Returns: accuracy value.
-
Masked CrossEntropyLoss¶
-
class
miprometheus.utils.
MaskedCrossEntropyLoss
(weight=None, ignore_index=-100)[source]¶ Calculates the cross entropy for batches with different numbers of outputs per samples.
-
__init__
(weight=None, ignore_index=-100)[source]¶ Constructor for the
MaskedCrossEntropyLoss
.Defines the inner loss as
CrossEntropyLoss
.Parameters: - weight (Tensor, optional) – a manual rescaling weight given to each class. If given, has to be a Tensor of size C
- ignore_index (int, optional) – Specifies a target value that is ignored and does not contribute to the input gradient.
-
forward
(logits, targets, mask)[source]¶ Calculates loss accounting for different numbers of output per sample.
Parameters: - logits (torch.tensor.) – Logits being output by the model. [batch, classes, sequence].
- targets (torch.LongTensor) – Targets [batch, sequence].
- mask (torch.ByteTensor) – Mask [batch, sequence].
Returns: loss value.
-
masked_accuracy
(logits, targets, mask)[source]¶ Calculates accuracy equal to mean number of correct predictions in a given batch.
Parameters: - logits (torch.tensor.) – Logits being output by the model. [batch, classes, sequence].
- targets (torch.LongTensor) – Targets [batch, sequence].
- mask (torch.ByteTensor) – Mask [batch, sequence].
Returns: accuracy value.
-
Problems Utils¶
GenerateFeatureMaps¶
-
class
miprometheus.utils.
GenerateFeatureMaps
(image_dir, cnn_model, num_blocks, filename_template, set='train', transform=<MagicMock name='mock.ToTensor' id='139963945493728'>)[source]¶ Class handling the generation of feature using a pretrained CNN for specified images.
-
__init__
(image_dir, cnn_model, num_blocks, filename_template, set='train', transform=<MagicMock name='mock.ToTensor' id='139963945493728'>)[source]¶ Creates the pretrained CNN model & move it to CUDA if available.
Parameters: - image_dir (str) – Directory path to the images to extract features from.
- cnn_model (str) – Name of the pretrained CNN model to use. Must be in
torchvision.models.
- num_blocks – number of layers to use from the cnn_model. This is dependent on the specified cnn_model, please check this value beforehand.
- filename_template –
The template followed by the filenames in
image_dir
. It should indicate with brackets where the index is located, e.g.>>> filename_template = 'CLEVR_train_{}.png'
The index will be filled up on 6 characters.
Parameters: - set (str, optional.) – The dataset split to use. e.g.
train
,val
etc. - transform (transforms, optional.) –
torchvision.transform
to apply on the images before passing them to the CNN model. default:>>> transform = transforms.ToTensor
-
Language¶
-
class
miprometheus.utils.
Language
(name)[source]¶ Class that loads pretrained embeddings from Torchtext.
-
__init__
(name)[source]¶ Constructor.
Parameters: name – string to name the language (at the moment it doesn’t do anything)
-
embed_sentence
(sentence)[source]¶ Embed an entire sentence using a pretrained embedding.
Parameters: sentence – A string containing the words to embed Returns: FloatTensor of embedded vectors [max_sentence_length, embedding size]
-
embed_word
(word)[source]¶ Embed a single word.
Parameters: sentence – A string containing a single word to embed Returns: FloatTensor with an single embedded vector in it [embedding size]
-
return_index_from_word
(word)[source]¶ returns the index of a word in the vocab.
Parameters: word – String of word in dictionary
-
return_word_from_index
(index)[source]¶ Returns a word in the vocab from its index.
Parameters: index – integer index of the word in the dictionary
-
build_pretrained_vocab
(data_set, **kwargs)[source]¶ Construct the torchtext Vocab object from a list of sentences. This allows us to load only vectors we actually need.
Parameters: - data_set – A list containing strings (either sentences or just single word string work)
- **kwargs – The keyword arguments for the vectors class from torch text. The most important kwarg is vectors which is a string containing the embedding type to be loaded
-