Concepts#

Estimator and Predictor#

GluonTS uses two main abstractions called Estimator and Predictor.

Each Predictor implements .predict(..) which, given some input time series, will return forecasts:

forecasts = predictor.predict(data)

In contrast, an Estimator is trained to produce a Predictor which then is used to make the actual predictions:

predictor = estimator.train(train_data)
forecasts = predictor.predict(data)

The reason to split Estimator and Predictor into two classes is that many models require a dedicated training step to generate a global model. This global model is only trained once, but is used to make predictions for all time series.

This is in contrast to local models, which are fitted on individual time series and therefore try to capture the characteristics of each time series but not the dataset in its entirety.

Training a global model can take a lot of time: up to hours, but sometimes even days. Thus, it is not feasible to train the model as part of the prediction request and it happens as a separate “offline” step. In contrast, fitting a local model is usually much faster and is done “online” as part of the prediction.

In GluonTS, local models are directly available as predictors, whilst global models are offered as estimators, which need to be trained first:

# global DeepAR model
estimator = DeepAREstimator(prediction_length=24, freq="H")
predictor = estimator.train(train_data)

# local Prophet model
predictor = ProphetPredictor(prediction_length=24)

Dataset#

In GluonTS, a Dataset is a collection of time series objects. Each of these objects has columns (or fields) which represent attributes of the time series.

Most models use the target column to indicate the time series that we want to predict in the future:

{"target": [1, 2, 3, 4, 5, 6]}

Note that the target column is not imposed by GluonTS onto models, but it is used by most models by convention.

API#

To be more precise, a Dataset is defined like this:

DataEntry = dict[str, Any]

class Dataset(Protocol):
    def __iter__(self) -> Iterator[DataEntry]:
        ...

    def __len__(self) -> int:
        ...

In other words, anything that can emit dictionaries can act as a Dataset.