miprometheus.workers¶

Worker¶

class miprometheus.workers.Worker(name, add_default_parser_args=True)[source]¶

Base abstract class for the workers. All base workers should subclass it and override the relevant methods.

__init__(name, add_default_parser_args=True)[source]¶

Base constructor for all workers:

Initializes the AppState singleton:
>>> self.app_state = AppState()
Initializes the Parameter Registry:
>>> self.params = ParamInterface()
Defines the logger:
>>> self.logger = logging.getLogger(name=self.name)
Creates parser and adds default worker command line arguments.

Parameters:	name (str) – Name of the worker. add_default_parser_args (bool) – If set, adds default parser arguments (DEFAULT: True).

initialize_logger()[source]¶

Initializes the logger, with a specific configuration:

>>> logger_config = {'version': 1,
>>>                  'disable_existing_loggers': False,
>>>                  'formatters': {
>>>                      'simple': {
>>>                          'format': '[%(asctime)s] - %(levelname)s - %(name)s >>> %(message)s',
>>>                          'datefmt': '%Y-%m-%d %H:%M:%S'}},
>>>                  'handlers': {
>>>                      'console': {
>>>                          'class': 'logging.StreamHandler',
>>>                          'level': 'INFO',
>>>                          'formatter': 'simple',
>>>                          'stream': 'ext://sys.stdout'}},
>>>                  'root': {'level': 'DEBUG',
>>>                           'handlers': ['console']}}

display_parsing_results()[source]¶: Displays the properly & improperly parsed arguments (if any).

setup_experiment()[source]¶

Setups a specific experiment.

Base method:

Parses command line arguments.

Sets the 3 default sections (training / validation / test) and sets their dataloaders params.

Note

Child classes should override this method, but still call its parent to draw the basic functionality implemented here.

build_problem_sampler_loader(params, section_name)[source]¶

Builds and returns the Problem class, alongside its DataLoader.

Also builds the sampler if required.

Parameters:	params (miprometheus.utils.ParamInterface) – ‘ParamInterface’ object, referring to one of main sections (training/validation/testing). section_name – name of the section that will be used by logger for display.
Returns:	Problem instance & DataLoader instance.

get_epoch_size(problem, sampler, batch_size, drop_last)[source]¶

Compute the number of iterations (‘episodes’) to run given the size of the dataset and the batch size to cover the entire dataset once.

Takes into account whether one used sampler or not.

Parameters:	problem – Object derived from the ‘’Problem’’ class sampler – Sampler (may be None) batch_size (int) – Batch size. drop_last (bool) – If True then last batch (if incomplete) will not be counted

Note

If the last batch is incomplete we are counting it in when drop_last in DataLoader() is set to Ttrue.

Warning

Leaving this method ‘just in case’, in most cases one might simply use ‘’len(dataloader)’‘.

Returns:	Number of iterations to perform to go though the entire dataset once.

export_experiment_configuration(log_dir, filename, user_confirm)[source]¶

Dumps the configuration to yaml file.

Parameters:	log_dir (str) – Directory used to host log files (such as the collected statistics). filename (str) – Name of the `yaml` file to write to. user_confirm (bool) – Whether to request user confirmation.

add_statistics(stat_col)[source]¶

Adds most elementary shared statistics to StatisticsCollector: episode and loss.

Parameters:	stat_col – `StatisticsCollector`.

add_aggregators(stat_agg)[source]¶

Adds basic statistical aggregators to StatisticsAggregator: episode, episodes_aggregated and loss derivatives.

Parameters:	stat_agg – `StatisticsAggregator`.

aggregate_statistics(stat_col, stat_agg)[source]¶

Aggregates the default statistics collected by the StatisticsCollector.

Note

Only computes the min, max, mean, std of the loss as these are basic statistical aggregator by default.

Given that the StatisticsAggregator uses the statistics collected by the StatisticsCollector, It should be ensured that these statistics are correctly collected (i.e. use of self.add_statistics() and collect_statistics()).

Parameters:	stat_col – `StatisticsCollector` stat_agg – `StatisticsAggregator`

run_experiment()[source]¶: Main function of the worker which executes a specific experiment.

Note

Abstract. Should be implemented in the subclasses.

add_file_handler_to_logger(logfile)[source]¶

Add a logging.FileHandler to the logger of the current Worker.

Specifies a logging.Formatter:

>>> logging.Formatter(fmt='[%(asctime)s] - %(levelname)s - %(name)s >>> %(message)s',
>>>                   datefmt='%Y-%m-%d %H:%M:%S')

Parameters:	logfile – File used by the `FileHandler`.

recurrent_config_parse(configs: str, configs_parsed: list)[source]¶

Parses names of configuration files in a recursive manner, i.e. by looking for default_config sections and trying to load and parse those files one by one.

Parameters:	configs (str) – String containing names of configuration files (with paths), separated by comas. configs_parsed (list) – Configurations that were already parsed (so we won’t parse them many times).
Returns:	list of parsed configuration files.

check_and_set_cuda(use_gpu)[source]¶

Enables computations on CUDA if GPU is available. Sets the default data types.

Parameters:	use_gpu – Command line flag indicating whether use GPU/CUDA or not.

predict_evaluate_collect(model, problem, data_dict, stat_col, episode, epoch=None)[source]¶

Function that performs the following:

passes samples through the model,

computes loss using the problem

collects problem and model statistics,

Parameters:

model (models.model.Model or a subclass) – trainable model.
problem (problems.problem.problem or a subclass) – problem generating samples.
data_dict (DataDict) – contains the batch of samples to pass to the model.
stat_col (StatisticsCollector) – statistics collector used for logging accuracy etc.
episode (int) – current episode index
epoch (int, optional) – current epoch index.

Returns:

logits,
loss

export_statistics(stat_obj, tag='', export_to_log=True)[source]¶

Export the statistics/aggregations to logger, csv and TB.

Parameters:	stat_obj – `StatisticsCollector` or `StatisticsAggregato` object. tag (str) – Additional tag that will be added to string exported to logger, optional (DEFAULT = ‘’). export_to_log (bool) – If True, exports statistics to logger (DEFAULT: True)

aggregate_and_export_statistics(problem, model, stat_col, stat_agg, episode, tag='', export_to_log=True)[source]¶

Aggregates the collected statistics. Exports the aggregations to logger, csv and TB. Empties statistics collector for the next episode.

Parameters:

model (models.model.Model or a subclass) – trainable model.
problem (problems.problem.problem or a subclass) – problem generating samples.
stat_col – StatisticsCollector object.
stat_agg – StatisticsAggregator object.
tag (str) – Additional tag that will be added to string exported to logger, optional (DEFAULT = ‘’).
export_to_log (bool) – If True, exports statistics to logger (DEFAULT: True)

cycle(iterable)[source]¶

Cycle an iterator to prevent its exhaustion. This function is used in the (online) trainer to reuse the same DataLoader for a number of episodes > len(dataset)/batch_size.

Parameters:	iterable (iter) – iterable.

set_random_seeds(params, section_name)[source]¶

Set torch & NumPy random seeds from the ParamRegistry: If one was indicated, use it, or set a random one.

Parameters:	params – Section in config/param registry that will be changed (“training” or “testing” only will be taken into account.) section_name (str) – Name of the section (for logging purposes only).

Trainer¶

class miprometheus.workers.Trainer(name='Trainer')[source]¶

Base class for the trainers.

Iterates over epochs on the dataset.

All other types of trainers (e.g. OnlineTrainer & OfflineTrainer) should subclass it.

__init__(name='Trainer')[source]¶

Base constructor for all trainers:

Adds default trainer command line arguments

Parameters:	name (str) – Name of the worker (DEFAULT: “Trainer”).

setup_experiment()[source]¶

Sets up experiment of all trainers:

Calls base class setup_experiment to parse the command line arguments,
Loads the config file(s):
>>> configs_to_load = self.recurrent_config_parse(flags.config, [])
Set up the log directory path:
>>> os.makedirs(self.log_dir, exist_ok=False)
Add a FileHandler to the logger:
>>>  self.add_file_handler_to_logger(self.log_file)
Set random seeds:
>>>  self.set_random_seeds(self.params['training'], 'training')
Creates training problem and model:
>>> self.training_problem = ProblemFactory.build_problem(self.params['training']['problem'])
>>> self.model = ModelFactory.build_model(self.params['model'], self.training_problem.default_values)
Creates the DataLoader:
>>> self.training_dataloader = DataLoader(dataset=self.training_problem, ...)
Handles curriculum learning if indicated:
>>> if 'curriculum_learning' in self.params['training']:
>>> ...
Handles the validation of the model:

Creates validation problem & DataLoader
Set optimizer:
>>> self.optimizer = getattr(torch.optim, optimizer_name)
Handles TensorBoard writers & files:
>>> self.training_writer = SummaryWriter(self.log_dir + '/training')

add_statistics(stat_col)[source]¶

Calls base method and adds epoch statistics to StatisticsCollector.

Parameters:	stat_col – `StatisticsCollector`.

add_aggregators(stat_agg)[source]¶

Adds basic aggregators to to StatisticsAggregator and extends them with: epoch.

Parameters:	stat_agg – `StatisticsAggregator`.

initialize_statistics_collection()[source]¶

Initializes all StatisticsCollectors and StatisticsAggregators used by a given worker:
- For training statistics (adds the statistics of the model & problem),
- For validation statistics (adds the statistics of the model & problem).
Creates the output files (csv).

finalize_statistics_collection()[source]¶: Finalizes the statistics collection by closing the csv files.

initialize_tensorboard()[source]¶: Initializes the TensorBoard writers, and log directories.

finalize_tensorboard()[source]¶: Finalizes the operation of TensorBoard writers by closing them.

validate_on_batch(valid_batch, episode, epoch)[source]¶

Performs a validation of the model using the provided batch.

Additionally logs results (to files, TensorBoard) and handles visualization.

Parameters:	valid_batch (`DataDict`) – data batch generated by the problem and used as input to the model. episode (int) – current training episode index. epoch (int, optional) – current epoch index.
Returns:	Validation loss.

validate_on_set(episode, epoch=None)[source]¶

Performs a validation of the model on the whole validation set, using the validation DataLoader.

Iterates over the entire validation set (through the DataLoader`), aggregates the collected statistics and logs that to the console, csv and TensorBoard (if set).

If visualization is activated, this function will select a random batch to visualize.

Parameters:	episode (int) – current training episode index. epoch (int, optional) – current epoch index.
Returns:	Average loss over the validation set.

OfflineTrainer¶

class miprometheus.workers.OfflineTrainer(name='OfflineTrainer')[source]¶

Implementation for the epoch-based OfflineTrainer.

..note:

The default ``OfflineTrainer`` is based on epochs.         An epoch is defined as passing through all samples of a finite-size dataset.        The ``OfflineTrainer`` allows to loop over all samples from the training set many times i.e. in many epochs.         When an epochs finishes, it performs a similar step for the validation set and collects the statistics.

__init__(name='OfflineTrainer')[source]¶

Only calls the Trainer constructor as the initialization phase is identical to the Trainer.

Parameters:	name (str) – Name of the worker (DEFAULT: “OfflineTrainer”).

setup_experiment()[source]¶

Sets up an experiment for the OfflineTrainer:

Calls base class setup_experiment to parse the command line arguments,

Sets up the terminal conditions (loss threshold, episodes (optional) & epochs limits).

run_experiment()[source]¶

Main function of the Trainer.

Iterates over the number of epochs of the training set.

Note

Because of the export of stats, weights and gradients to TensorBoard, we need to keep track of the current episode index from the start of the training, even though the Worker runs on epoch.

Warning

The test for terminal conditions (e.g. convergence) is done at the end of each epoch. The terminal conditions are as follows:

The loss is below the specified threshold (using the full validation loss),

TODO: II. Early stopping is set and the full validation loss did not change by delta for the indicated number of epochs,

The maximum number of epochs has been met,

The maximum number of episodes has been met (optional).

Besides, the user can always stop experiment by pressing ‘Stop experiment’ during visualization.

The function does the following for each epoch:

Executes the initialize_epoch() & finish_epoch() function of the Problem class,

For each episode:

Resets the gradients,

Forwards pass of the model,

Logs statistics and exports to TensorBoard (if set),

Computes gradients and update weights,

Activates visualization if set (vis. level 0),

Validates the model on a batch according to the validation frequency.

At the end of epoch:

Handles curriculum learning (if set),

Validates the model on the full validation set, logs the statistics and visualizes on a random batch if set (vis. level 1 or 2)

Checks the above terminal conditions.

The last validation on the full set is done additionally at the end on training, with optional visualization of a random batch if set (vis. level 3).

OnlineTrainer¶

class miprometheus.workers.OnlineTrainer(name='OnlineTrainer')[source]¶

Implementation for the episode-based OnlineTrainer.

..note

The ``OfflineTrainer`` is based on epochs. While an epoch can be defined for all finite-size datasets,         it makes less sense for problems which have a very large, almost infinite, dataset (like algorithmic         tasks, which generate random data on-the-fly).
This is why this OnlineTrainer was implemented. Instead of looping on epochs, it iterates directly on         episodes (we call an iteration on a single batch an episode).

__init__(name='OnlineTrainer')[source]¶

Only calls the Trainer constructor as the initialization phase is identical to the Trainer.

Parameters:	name (str) – Name of the worker (DEFAULT: “OnlineTrainer”).

setup_experiment()[source]¶

Sets up experiment for episode trainer:

Calls base class setup_experiment to parse the command line arguments,

Sets up the terminal conditions (loss threshold, episodes & epochs (optional) limits).

run_experiment()[source]¶

Main function of the OnlineTrainer, runs the experiment.

Iterates over the (cycled) DataLoader (one iteration = one episode).

Note

The test for terminal conditions (e.g. convergence) is done at the end of each episode. The terminal conditions are as follows:

The loss is below the specified threshold (using the partial validation loss),

TODO: II. Early stopping is set and the full validation loss did not change by delta for the indicated number of epochs,

The maximum number of episodes has been met,

The maximum number of epochs has been met (OPTIONAL).

Additionally, experiment can be stopped by the user by pressing ‘Stop experiment’ during visualization.

The function does the following for each episode:

Handles curriculum learning if set,

Resets the gradients

Forwards pass of the model,

Logs statistics and exports to TensorBoard (if set),

Computes gradients and update weights

Activate visualization if set,

Validate the model on a batch according to the validation frequency.

Checks the above terminal conditions.

Tester¶

class miprometheus.workers.Tester(name='Tester')[source]¶

Defines the basic Tester.

If defining another type of tester, it should subclass it.

__init__(name='Tester')[source]¶

Calls the Worker constructor, adds some additional params to parser.

Parameters:	name (str) – Name of the worker (DEFAULT: “Tester”).

setup_global_experiment()[source]¶

Sets up the global test experiment for the Tester:

Checks that the model to use exists on file:
>>> if not os.path.isfile(flags.model)
Checks that the configuration file exists:
>>> if not os.path.isfile(config_file)
Create the configuration:
>>> self.params.add_config_params_from_yaml(config)

The rest of the experiment setup is done in setup_individual_experiment() to allow for multiple tests suppport.

setup_individual_experiment()[source]¶

Setup individual test experiment in the case of multiple tests, or the main experiment in the case of one test experiment.

Set up the log directory path:

>>> os.makedirs(self.log_dir, exist_ok=False)

Add a FileHandler to the logger (defined in BaseWorker):
```
>>>  self.logger.addHandler(fh)
```

Set random seeds:

>>>  self.set_random_seeds(self.params['testing'], 'testing')

Creates problem and model:

>>> self.problem = ProblemFactory.build_problem(self.params['training']['problem'])
>>> self.model = ModelFactory.build_model(self.params['model'], self.dataset.default_values)

Creates the DataLoader:

>>> self.dataloader = DataLoader(dataset=self.problem, ...)

initialize_statistics_collection()[source]¶: Function initializes all statistics collectors and aggregators used by a given worker, creates output files etc.

finalize_statistics_collection()[source]¶: Finalizes statistics collection, closes all files etc.

run_experiment()[source]¶

Main function of the Tester: Test the loaded model over the test set.

Iterates over the DataLoader for a maximum number of episodes equal to the test set size.

The function does the following for each episode:

Forwards pass of the model,

Logs statistics & accumulates loss,

Activate visualization if set.

check_multi_tests()[source]¶

Checks if multiple tests are indicated in the testing configuration section.

Note

If the user would like to run multiple tests, he can use the multi_tests key in the testing section to indicate the keys which associated values will be different for each test config.

E.g.

>>> # Problem parameters:
>>> testing:
>>>     problem:
>>>         name: SortOfCLEVR
>>>         batch_size: 64
>>>         data_folder: '~/data/sort-of-clevr/'
>>>         dataset_size: 10000
>>>         split: 'test'
>>>         img_size: 128
>>>         regenerate: False
>>>
>>>     multi_tests: {batch_size: [64, 128], img_size: [128, 256]}

Warning

The following constraints apply:

Assume that the indicated varying values are leafs of the testing section
The number of indicated varying values per key is the same for all keys
The indicated order of the varying values will be respected, i.e.

>>>     multi_tests: {batch_size: [64, 128], img_size: [128, 256]}

and

>>>     multi_tests: {batch_size: [64, 128], img_size: [256, 128]}

will lead to different test configs.

At least one key has varying values (but this is implicit)

Returns:	True if the constraints above are respected, else False

update_config(test_index)[source]¶

Update self.params['testing'] using the list of values to change for the multiple tests.

Parameters:	test_index (int) – Current test experiment index.