gluonts.mx.trainer.model_iteration_averaging module

class gluonts.mx.trainer.model_iteration_averaging.Alpha_Suffix(epochs: int, alpha: float = 0.75, eta: float = 0)[source]

Bases: gluonts.mx.trainer.model_iteration_averaging.IterationAveragingStrategy

Implement Alpha Suffix model averaging. This method is based on paper “Making Gradient Descent Optimalfor Strongly Convex Stochastic Optimization”, (https://arxiv.org/pdf/1109.5647.pdf).

alpha_suffix: float = None
update_average_trigger(metric: Any = None, epoch: int = 0, **kwargs)[source]
Parameters
  • metric – The criteria to trigger averaging, not used in Alpha Suffix.

  • epoch – The epoch to start averaging.

class gluonts.mx.trainer.model_iteration_averaging.IterationAveragingStrategy(eta: float = 0)[source]

Bases: object

The model averaging is based on paper “Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes”, (http://proceedings.mlr.press/v28/shamir13.pdf), which implements polynomial-decay averaging, parameterized by eta. When eta = 0, it is equivalent to simple average over all iterations with same weights.

apply(model: mxnet.gluon.block.HybridBlock) → Optional[Dict][source]
Parameters

model – The model of the current iteration.

Returns

Return type

The averaged model, None if the averaging hasn’t started.

average_counter: int = None
averaged_model: Optional[Dict[str, mx.nd.NDArray]] = None
averaging_started: bool = None
cached_model: Optional[Dict[str, mx.nd.NDArray]] = None
load_averaged_model(model: mxnet.gluon.block.HybridBlock)[source]

When validating/evaluating the averaged model in the half way of training, use load_averaged_model first to load the averaged model and overwrite the current model, do the evaluation, and then use load_cached_model to load the current model back.

Parameters

model – The model that the averaged model is loaded to.

load_cached_model(model: mxnet.gluon.block.HybridBlock)[source]
Parameters

model – The model that the cached model is loaded to.

update_average(model: mxnet.gluon.block.HybridBlock)[source]
Parameters

model – The model to update the average.

update_average_trigger(metric: Any = None, epoch: int = 0, **kwargs)[source]
Parameters
  • metric – The criteria to trigger averaging.

  • epoch – The epoch to start averaging.

class gluonts.mx.trainer.model_iteration_averaging.NTA(n: int = 5, maximize: bool = False, last_n_trigger: bool = False, eta: float = 0)[source]

Bases: gluonts.mx.trainer.model_iteration_averaging.IterationAveragingStrategy

Implement Non-monotonically Triggered AvSGD (NTA). This method is based on paper “Regularizing and Optimizing LSTM Language Models”, (https://openreview.net/pdf?id=SyyGPP0TZ), and an implementation is available in Salesforce GitHub (https://github.com/salesforce/awd-lstm-lm/blob/master/main.py) Note that it mismatches the arxiv (and gluonnlp) version, which is referred to as NTA_V2 below

update_average_trigger(metric: Any = None, epoch: int = 0, **kwargs)[source]
Parameters
  • metric – The criteria to trigger averaging.

  • epoch – The epoch to start averaging, not used in NTA

val_logs: List[Any] = None