gluonts.model.transformer package

class gluonts.model.transformer.TransformerEstimator(freq: str, prediction_length: int, context_length: Optional[int] = None, trainer: =, batch_size=None, callbacks=None, clip_gradient=10.0, ctx=None, epochs=100, hybridize=True, init="xavier", learning_rate=0.001, learning_rate_decay_factor=0.5, minimum_learning_rate=5e-05, num_batches_per_epoch=50, patience=10, weight_decay=1e-08), dropout_rate: float = 0.1, cardinality: Optional[List[int]] = None, embedding_dimension: int = 20, distr_output: =, model_dim: int = 32, inner_ff_dim_scale: int = 4, pre_seq: str = 'dn', post_seq: str = 'drn', act_type: str = 'softrelu', num_heads: int = 8, scaling: bool = True, lags_seq: Optional[List[int]] = None, time_features: Optional[List[gluonts.time_feature._base.TimeFeature]] = None, use_feat_dynamic_real: bool = False, use_feat_static_cat: bool = False, num_parallel_samples: int = 100, train_sampler: Optional[gluonts.transform.sampler.InstanceSampler] = None, validation_sampler: Optional[gluonts.transform.sampler.InstanceSampler] = None, batch_size: int = 32)[source]


Construct a Transformer estimator.

This implements a Transformer model, close to the one described in [Vaswani2017].


Vaswani, Ashish, et al. “Attention is all you need.” Advances in neural information processing systems. 2017.

  • freq – Frequency of the data to train on and predict

  • prediction_length – Length of the prediction horizon

  • context_length – Number of steps to unroll the RNN for before computing predictions (default: None, in which case context_length = prediction_length)

  • trainer – Trainer object to be used (default: Trainer())

  • dropout_rate – Dropout regularization parameter (default: 0.1)

  • cardinality – Number of values of the each categorical feature (default: [1])

  • embedding_dimension – Dimension of the embeddings for categorical features (the same dimension is used for all embeddings, default: 5)

  • distr_output – Distribution to use to evaluate observations and sample predictions (default: StudentTOutput())

  • model_dim – Dimension of the transformer network, i.e., embedding dimension of the input (default: 32)

  • inner_ff_dim_scale – Dimension scale of the inner hidden layer of the transformer’s feedforward network (default: 4)

  • pre_seq – Sequence that defined operations of the processing block before the main transformer network. Available operations: ‘d’ for dropout, ‘r’ for residual connections and ‘n’ for normalization (default: ‘dn’)

  • post_seq – seq Sequence that defined operations of the processing block in and after the main transformer network. Available operations: ‘d’ for dropout, ‘r’ for residual connections and ‘n’ for normalization (default: ‘drn’).

  • act_type – Activation type of the transformer network (default: ‘softrelu’)

  • num_heads – Number of heads in the multi-head attention (default: 8)

  • scaling – Whether to automatically scale the target values (default: true)

  • lags_seq – Indices of the lagged target values to use as inputs of the RNN (default: None, in which case these are automatically determined based on freq)

  • time_features – Time features to use as inputs of the RNN (default: None, in which case these are automatically determined based on freq)

  • num_parallel_samples – Number of evaluation samples per time series to increase parallelism during inference. This is a model optimization that does not affect the accuracy (default: 100)

  • train_sampler – Controls the sampling of windows during training.

  • validation_sampler – Controls the sampling of windows during validation.

  • batch_size – The size of the batches to be used training and prediction.

create_predictor(transformation: gluonts.transform._base.Transformation, trained_network: mxnet.gluon.block.HybridBlock) → gluonts.model.predictor.Predictor[source]

Create and return a predictor object.


A predictor wrapping a HybridBlock used for inference.

Return type


create_training_data_loader(data: Iterable[Dict[str, Any]], **kwargs) → gluonts.dataset.loader.DataLoader[source]
create_training_network() → gluonts.model.transformer._network.TransformerTrainingNetwork[source]

Create and return the network used for training (i.e., computing the loss).


The network that computes the loss given input data.

Return type


create_transformation() → gluonts.transform._base.Transformation[source]

Create and return the transformation needed for training and inference.


The transformation that will be applied entry-wise to datasets, at training and inference time.

Return type


create_validation_data_loader(data: Iterable[Dict[str, Any]], **kwargs) → gluonts.dataset.loader.DataLoader[source]
freq = None
lead_time = None
prediction_length = None