miprometheus.workers¶
Worker¶
-
class
miprometheus.workers.
Worker
(name, add_default_parser_args=True)[source]¶ Base abstract class for the workers. All base workers should subclass it and override the relevant methods.
-
__init__
(name, add_default_parser_args=True)[source]¶ Base constructor for all workers:
Initializes the AppState singleton:
>>> self.app_state = AppState()
Initializes the Parameter Registry:
>>> self.params = ParamInterface()
Defines the logger:
>>> self.logger = logging.getLogger(name=self.name)
Creates parser and adds default worker command line arguments.
Parameters:
-
initialize_logger
()[source]¶ Initializes the logger, with a specific configuration:
>>> logger_config = {'version': 1, >>> 'disable_existing_loggers': False, >>> 'formatters': { >>> 'simple': { >>> 'format': '[%(asctime)s] - %(levelname)s - %(name)s >>> %(message)s', >>> 'datefmt': '%Y-%m-%d %H:%M:%S'}}, >>> 'handlers': { >>> 'console': { >>> 'class': 'logging.StreamHandler', >>> 'level': 'INFO', >>> 'formatter': 'simple', >>> 'stream': 'ext://sys.stdout'}}, >>> 'root': {'level': 'DEBUG', >>> 'handlers': ['console']}}
-
setup_experiment
()[source]¶ Setups a specific experiment.
Base method:
- Parses command line arguments.
- Sets the 3 default sections (training / validation / test) and sets their dataloaders params.
Note
Child classes should override this method, but still call its parent to draw the basic functionality implemented here.
-
build_problem_sampler_loader
(params, section_name)[source]¶ Builds and returns the Problem class, alongside its DataLoader.
Also builds the sampler if required.
Parameters: - params (miprometheus.utils.ParamInterface) – ‘ParamInterface’ object, referring to one of main sections (training/validation/testing).
- section_name – name of the section that will be used by logger for display.
Returns: Problem instance & DataLoader instance.
-
get_epoch_size
(problem, sampler, batch_size, drop_last)[source]¶ Compute the number of iterations (‘episodes’) to run given the size of the dataset and the batch size to cover the entire dataset once.
Takes into account whether one used sampler or not.
Parameters: Note
If the last batch is incomplete we are counting it in when
drop_last
inDataLoader()
is set to Ttrue.Warning
Leaving this method ‘just in case’, in most cases one might simply use ‘’len(dataloader)’‘.
Returns: Number of iterations to perform to go though the entire dataset once.
-
export_experiment_configuration
(log_dir, filename, user_confirm)[source]¶ Dumps the configuration to
yaml
file.Parameters:
-
add_statistics
(stat_col)[source]¶ Adds most elementary shared statistics to
StatisticsCollector
: episode and loss.Parameters: stat_col – StatisticsCollector
.
-
add_aggregators
(stat_agg)[source]¶ Adds basic statistical aggregators to
StatisticsAggregator
: episode, episodes_aggregated and loss derivatives.Parameters: stat_agg – StatisticsAggregator
.
-
aggregate_statistics
(stat_col, stat_agg)[source]¶ Aggregates the default statistics collected by the
StatisticsCollector
.Note
Only computes the min, max, mean, std of the loss as these are basic statistical aggregator by default.
Given that the
StatisticsAggregator
uses the statistics collected by theStatisticsCollector
, It should be ensured that these statistics are correctly collected (i.e. use ofself.add_statistics()
andcollect_statistics()
).Parameters: - stat_col –
StatisticsCollector
- stat_agg –
StatisticsAggregator
- stat_col –
-
run_experiment
()[source]¶ Main function of the worker which executes a specific experiment.
Note
Abstract. Should be implemented in the subclasses.
-
add_file_handler_to_logger
(logfile)[source]¶ Add a
logging.FileHandler
to the logger of the currentWorker
.Specifies a
logging.Formatter
:>>> logging.Formatter(fmt='[%(asctime)s] - %(levelname)s - %(name)s >>> %(message)s', >>> datefmt='%Y-%m-%d %H:%M:%S')
Parameters: logfile – File used by the FileHandler
.
-
recurrent_config_parse
(configs: str, configs_parsed: list)[source]¶ Parses names of configuration files in a recursive manner, i.e. by looking for
default_config
sections and trying to load and parse those files one by one.Parameters: Returns: list of parsed configuration files.
-
check_and_set_cuda
(use_gpu)[source]¶ Enables computations on CUDA if GPU is available. Sets the default data types.
Parameters: use_gpu – Command line flag indicating whether use GPU/CUDA or not.
-
predict_evaluate_collect
(model, problem, data_dict, stat_col, episode, epoch=None)[source]¶ Function that performs the following:
- passes samples through the model,
- computes loss using the problem
- collects problem and model statistics,
Parameters: - model (
models.model.Model
or a subclass) – trainable model. - problem (
problems.problem.problem
or a subclass) – problem generating samples. - data_dict (
DataDict
) – contains the batch of samples to pass to the model. - stat_col (
StatisticsCollector
) – statistics collector used for logging accuracy etc. - episode (int) – current episode index
- epoch (int, optional) – current epoch index.
Returns: - logits,
- loss
-
export_statistics
(stat_obj, tag='', export_to_log=True)[source]¶ Export the statistics/aggregations to logger, csv and TB.
Parameters:
-
aggregate_and_export_statistics
(problem, model, stat_col, stat_agg, episode, tag='', export_to_log=True)[source]¶ Aggregates the collected statistics. Exports the aggregations to logger, csv and TB. Empties statistics collector for the next episode.
Parameters: - model (
models.model.Model
or a subclass) – trainable model. - problem (
problems.problem.problem
or a subclass) – problem generating samples. - stat_col –
StatisticsCollector
object. - stat_agg –
StatisticsAggregator
object. - tag (str) – Additional tag that will be added to string exported to logger, optional (DEFAULT = ‘’).
- export_to_log (bool) – If True, exports statistics to logger (DEFAULT: True)
- model (
-
cycle
(iterable)[source]¶ Cycle an iterator to prevent its exhaustion. This function is used in the (online) trainer to reuse the same
DataLoader
for a number of episodes > len(dataset)/batch_size.Parameters: iterable (iter) – iterable.
-
set_random_seeds
(params, section_name)[source]¶ Set
torch
&NumPy
random seeds from theParamRegistry
: If one was indicated, use it, or set a random one.Parameters: - params – Section in config/param registry that will be changed (“training” or “testing” only will be taken into account.)
- section_name (str) – Name of the section (for logging purposes only).
-
Trainer¶
-
class
miprometheus.workers.
Trainer
(name='Trainer')[source]¶ Base class for the trainers.
Iterates over epochs on the dataset.
All other types of trainers (e.g.
OnlineTrainer
&OfflineTrainer
) should subclass it.-
__init__
(name='Trainer')[source]¶ Base constructor for all trainers:
- Adds default trainer command line arguments
Parameters: name (str) – Name of the worker (DEFAULT: “Trainer”).
-
setup_experiment
()[source]¶ Sets up experiment of all trainers:
Calls base class setup_experiment to parse the command line arguments,
Loads the config file(s):
>>> configs_to_load = self.recurrent_config_parse(flags.config, [])
Set up the log directory path:
>>> os.makedirs(self.log_dir, exist_ok=False)
Add a
FileHandler
to the logger:>>> self.add_file_handler_to_logger(self.log_file)
Set random seeds:
>>> self.set_random_seeds(self.params['training'], 'training')
Creates training problem and model:
>>> self.training_problem = ProblemFactory.build_problem(self.params['training']['problem']) >>> self.model = ModelFactory.build_model(self.params['model'], self.training_problem.default_values)
Creates the DataLoader:
>>> self.training_dataloader = DataLoader(dataset=self.training_problem, ...)
Handles curriculum learning if indicated:
>>> if 'curriculum_learning' in self.params['training']: >>> ...
Handles the validation of the model:
- Creates validation problem & DataLoader
Set optimizer:
>>> self.optimizer = getattr(torch.optim, optimizer_name)
Handles TensorBoard writers & files:
>>> self.training_writer = SummaryWriter(self.log_dir + '/training')
-
add_statistics
(stat_col)[source]¶ Calls base method and adds epoch statistics to
StatisticsCollector
.Parameters: stat_col – StatisticsCollector
.
-
add_aggregators
(stat_agg)[source]¶ Adds basic aggregators to to
StatisticsAggregator
and extends them with: epoch.Parameters: stat_agg – StatisticsAggregator
.
-
initialize_statistics_collection
()[source]¶ - Initializes all
StatisticsCollectors
andStatisticsAggregators
used by a given worker: - For training statistics (adds the statistics of the model & problem),
- For validation statistics (adds the statistics of the model & problem).
- Initializes all
- Creates the output files (csv).
-
finalize_statistics_collection
()[source]¶ Finalizes the statistics collection by closing the csv files.
-
validate_on_batch
(valid_batch, episode, epoch)[source]¶ Performs a validation of the model using the provided batch.
Additionally logs results (to files, TensorBoard) and handles visualization.
Parameters: Returns: Validation loss.
-
validate_on_set
(episode, epoch=None)[source]¶ Performs a validation of the model on the whole validation set, using the validation
DataLoader
.Iterates over the entire validation set (through the DataLoader`), aggregates the collected statistics and logs that to the console, csv and TensorBoard (if set).
If visualization is activated, this function will select a random batch to visualize.
Parameters: Returns: Average loss over the validation set.
-
OfflineTrainer¶
-
class
miprometheus.workers.
OfflineTrainer
(name='OfflineTrainer')[source]¶ Implementation for the epoch-based
OfflineTrainer
...note:
The default ``OfflineTrainer`` is based on epochs. An epoch is defined as passing through all samples of a finite-size dataset. The ``OfflineTrainer`` allows to loop over all samples from the training set many times i.e. in many epochs. When an epochs finishes, it performs a similar step for the validation set and collects the statistics.
-
__init__
(name='OfflineTrainer')[source]¶ - Only calls the
Trainer
constructor as the initialization phase is identical to theTrainer
.Parameters: name (str) – Name of the worker (DEFAULT: “OfflineTrainer”).
-
setup_experiment
()[source]¶ Sets up an experiment for the
OfflineTrainer
:- Calls base class setup_experiment to parse the command line arguments,
- Sets up the terminal conditions (loss threshold, episodes (optional) & epochs limits).
-
run_experiment
()[source]¶ Main function of the
Trainer
.Iterates over the number of epochs of the training set.
Note
Because of the export of stats, weights and gradients to TensorBoard, we need to keep track of the current episode index from the start of the training, even though the Worker runs on epoch.
Warning
The test for terminal conditions (e.g. convergence) is done at the end of each epoch. The terminal conditions are as follows:
- The loss is below the specified threshold (using the full validation loss),
- TODO: II. Early stopping is set and the full validation loss did not change by delta for the indicated number of epochs,
- The maximum number of epochs has been met,
- The maximum number of episodes has been met (optional).
Besides, the user can always stop experiment by pressing ‘Stop experiment’ during visualization.
The function does the following for each epoch:
Executes the
initialize_epoch()
&finish_epoch()
function of theProblem
class,For each episode:
- Resets the gradients,
- Forwards pass of the model,
- Logs statistics and exports to TensorBoard (if set),
- Computes gradients and update weights,
- Activates visualization if set (vis. level 0),
- Validates the model on a batch according to the validation frequency.
At the end of epoch:
- Handles curriculum learning (if set),
- Validates the model on the full validation set, logs the statistics and visualizes on a random batch if set (vis. level 1 or 2)
- Checks the above terminal conditions.
The last validation on the full set is done additionally at the end on training, with optional visualization of a random batch if set (vis. level 3).
-
OnlineTrainer¶
-
class
miprometheus.workers.
OnlineTrainer
(name='OnlineTrainer')[source]¶ Implementation for the episode-based
OnlineTrainer
...note
The ``OfflineTrainer`` is based on epochs. While an epoch can be defined for all finite-size datasets, it makes less sense for problems which have a very large, almost infinite, dataset (like algorithmic tasks, which generate random data on-the-fly). This is why this OnlineTrainer was implemented. Instead of looping on epochs, it iterates directly on episodes (we call an iteration on a single batch an episode).
-
__init__
(name='OnlineTrainer')[source]¶ - Only calls the
Trainer
constructor as the initialization phase is identical to theTrainer
.Parameters: name (str) – Name of the worker (DEFAULT: “OnlineTrainer”).
-
setup_experiment
()[source]¶ Sets up experiment for episode trainer:
- Calls base class setup_experiment to parse the command line arguments,
- Sets up the terminal conditions (loss threshold, episodes & epochs (optional) limits).
-
run_experiment
()[source]¶ Main function of the
OnlineTrainer
, runs the experiment.Iterates over the (cycled) DataLoader (one iteration = one episode).
Note
The test for terminal conditions (e.g. convergence) is done at the end of each episode. The terminal conditions are as follows:
- The loss is below the specified threshold (using the partial validation loss),
- TODO: II. Early stopping is set and the full validation loss did not change by delta for the indicated number of epochs,
- The maximum number of episodes has been met,
- The maximum number of epochs has been met (OPTIONAL).
Additionally, experiment can be stopped by the user by pressing ‘Stop experiment’ during visualization.
The function does the following for each episode:
- Handles curriculum learning if set,
- Resets the gradients
- Forwards pass of the model,
- Logs statistics and exports to TensorBoard (if set),
- Computes gradients and update weights
- Activate visualization if set,
- Validate the model on a batch according to the validation frequency.
- Checks the above terminal conditions.
-
Tester¶
-
class
miprometheus.workers.
Tester
(name='Tester')[source]¶ Defines the basic
Tester
.If defining another type of tester, it should subclass it.
-
__init__
(name='Tester')[source]¶ - Calls the
Worker
constructor, adds some additional params to parser.Parameters: name (str) – Name of the worker (DEFAULT: “Tester”).
-
setup_global_experiment
()[source]¶ Sets up the global test experiment for the
Tester
:Checks that the model to use exists on file:
>>> if not os.path.isfile(flags.model)
Checks that the configuration file exists:
>>> if not os.path.isfile(config_file)
Create the configuration:
>>> self.params.add_config_params_from_yaml(config)
The rest of the experiment setup is done in
setup_individual_experiment()
to allow for multiple tests suppport.
-
setup_individual_experiment
()[source]¶ Setup individual test experiment in the case of multiple tests, or the main experiment in the case of one test experiment.
Set up the log directory path:
>>> os.makedirs(self.log_dir, exist_ok=False)
Add a FileHandler to the logger (defined in BaseWorker):
>>> self.logger.addHandler(fh)
Set random seeds:
>>> self.set_random_seeds(self.params['testing'], 'testing')
Creates problem and model:
>>> self.problem = ProblemFactory.build_problem(self.params['training']['problem']) >>> self.model = ModelFactory.build_model(self.params['model'], self.dataset.default_values)
Creates the DataLoader:
>>> self.dataloader = DataLoader(dataset=self.problem, ...)
-
initialize_statistics_collection
()[source]¶ Function initializes all statistics collectors and aggregators used by a given worker, creates output files etc.
-
run_experiment
()[source]¶ Main function of the
Tester
: Test the loaded model over the test set.Iterates over the
DataLoader
for a maximum number of episodes equal to the test set size.The function does the following for each episode:
- Forwards pass of the model,
- Logs statistics & accumulates loss,
- Activate visualization if set.
-
check_multi_tests
()[source]¶ Checks if multiple tests are indicated in the testing configuration section.
Note
If the user would like to run multiple tests, he can use the
multi_tests
key in thetesting
section to indicate the keys which associated values will be different for each test config.E.g.
>>> # Problem parameters: >>> testing: >>> problem: >>> name: SortOfCLEVR >>> batch_size: 64 >>> data_folder: '~/data/sort-of-clevr/' >>> dataset_size: 10000 >>> split: 'test' >>> img_size: 128 >>> regenerate: False >>> >>> multi_tests: {batch_size: [64, 128], img_size: [128, 256]}
Warning
The following constraints apply:
- Assume that the indicated varying values are leafs of the testing section
- The number of indicated varying values per key is the same for all keys
- The indicated order of the varying values will be respected, i.e.
>>> multi_tests: {batch_size: [64, 128], img_size: [128, 256]}
and
>>> multi_tests: {batch_size: [64, 128], img_size: [256, 128]}
will lead to different test configs.
- At least one key has varying values (but this is implicit)
Returns: True if the constraints above are respected, else False
-