Tuning models with Optuna#

In this notebook we will see how to tune the hyperparameters of a GlutonTS model using Optuna. For this example, we are going to tune a PyTorch-based DeepAREstimator.

Note: to keep the running time of this example short, here we consider a small-scale dataset, and tune only two hyperparameters over a very small number of tuning rounds (“trials”). In real applications, especially for larger datasets, you will probably need to increase the search space and increase the number of trials.

Data loading and processing#

import mxnet as mx
from mxnet import gluon
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import json

Provided datasets#

from gluonts.dataset.repository.datasets import get_dataset, dataset_recipes
from gluonts.dataset.util import to_pandas
print(f"Available datasets: {list(dataset_recipes.keys())}")
dataset = get_dataset("m4_hourly")

Extract and split training and test data sets#

In general, the datasets provided by GluonTS are objects that consists of three things:

  • dataset.train is an iterable collection of data entries used for training. Each entry corresponds to one time series

  • dataset.test is an iterable collection of data entries used for inference. The test dataset is an extended version of the train dataset that contains a window in the end of each time series that was not seen during training. This window has length equal to the recommended prediction length.

  • dataset.metadata contains metadata of the dataset such as the frequency of the time series, a recommended prediction horizon, associated features, etc.

We can check details of the dataset.metadata.

print(f"Recommended prediction horizon: {dataset.metadata.prediction_length}")
print(f"Frequency of the time series: {dataset.metadata.freq}")

This is what the data looks like (first training series, first two weeks of data)

to_pandas(next(iter(dataset.train)))[:14 * 24].plot()
plt.grid(which="both")
plt.legend(["train series"], loc="upper left")
plt.show()

Tuning parameters of DeepAR estimator#

import optuna
import torch
from gluonts.torch.model.deepar import DeepAREstimator
from gluonts.mx import Trainer
from gluonts.evaluation import Evaluator

We will now tune the DeepAR estimator on our training data using Optuna. We choose two hyperparameters num_layers and hidden_size to optimize.

First, we define an DeepARTuningObjective class used in tuning process of Optuna. The class can be configured with the dataset, prediction length and data frequency, and the metric to be used for evaluating the model. In the get_params method, we define what hyperparameters to be tuned within given range. In the split_entry method, we split each time series of the dataset into two part:

  • entry_past: the training part

  • entry_future: the label part used in validation In the __call__ method, we define the way the DeepAREstimator is used in training and validation.

class DeepARTuningObjective:  
    def __init__(self, dataset, prediction_length, freq, metric_type="mean_wQuantileLoss"):
        self.dataset = dataset
        self.prediction_length = prediction_length
        self.freq = freq
        self.metric_type = metric_type

        entry_split = [self.split_entry(entry) for entry in self.dataset]
        self.entry_pasts = [entry[0] for entry in entry_split]
        self.entry_futures = [entry[1] for entry in entry_split]
    
    def get_params(self, trial) -> dict:
        return {
            "num_layers": trial.suggest_int("num_layers", 1, 5),
            "hidden_size": trial.suggest_int("hidden_size", 10, 50),
        }

    def split_entry(self, entry):
        entry_past = {}
        for key, value in entry.items():
            if key == "target":
                entry_past[key] = value[: -self.prediction_length]
            else:
                entry_past[key] = value

        df = pd.DataFrame(
            entry['target'],
            columns=[entry['item_id']],
            index=pd.period_range(
                start=entry['start'],
                periods=len(entry['target']),
                freq=self.freq
            )
        )

        return entry_past, df[-self.prediction_length:]
     
    def __call__(self, trial):
        params = self.get_params(trial)
        estimator = DeepAREstimator(
            num_layers=params['num_layers'],
            hidden_size=params['hidden_size'],
            prediction_length=self.prediction_length, 
            freq=self.freq,
            trainer_kwargs={
                "enable_progress_bar": False,
                "enable_model_summary": False,
                "max_epochs": 10,
            }
        )
        
        
        predictor = estimator.train(self.entry_pasts, cache_data=True)
        forecast_it = predictor.predict(self.entry_pasts)
        
        forecasts = list(forecast_it)
        
        evaluator = Evaluator(quantiles=[0.1, 0.5, 0.9])
        agg_metrics, item_metrics = evaluator(self.entry_futures, forecasts, num_series=len(self.dataset))
        return agg_metrics[self.metric_type]

We can now invoke the Optuna tuning process.

import time
start_time = time.time()
study = optuna.create_study(direction="minimize")
study.optimize(
    DeepARTuningObjective(dataset.train, dataset.metadata.prediction_length, dataset.metadata.freq),
    n_trials=5
)

print("Number of finished trials: {}".format(len(study.trials)))

print("Best trial:")
trial = study.best_trial

print("  Value: {}".format(trial.value))

print("  Params: ")
for key, value in trial.params.items():
    print("    {}: {}".format(key, value))
print(time.time() - start_time)

Re-training the model#

After getting the best hyperparameters by optuna, you can set them into the DeepAR estimator to re-train the model on the whole training subset we consider here.

estimator = DeepAREstimator(
    num_layers=trial.params["num_layers"],
    hidden_size=trial.params["hidden_size"],
    prediction_length=dataset.metadata.prediction_length,
    context_length=100,
    freq=dataset.metadata.freq,
    trainer_kwargs={
        "enable_progress_bar": False,
        "enable_model_summary": False,
        "max_epochs": 10,
    }
)

After specifying our estimator with all the necessary hyperparameters we can train it using our training dataset train_subset by invoking the train method of the estimator. The training algorithm returns a fitted model (or a Predictor in GluonTS parlance) that can be used to obtain forecasts.

predictor = estimator.train(dataset.train, cache_data=True)

Visualize and evaluate forecasts#

With a predictor in hand, we can now predict the last window of the test dataset and evaluate our model’s performance.

GluonTS comes with the make_evaluation_predictions function that automates the process of prediction and model evaluation. Roughly, this function performs the following steps:

  • Removes the final window of length prediction_length of the test dataset that we want to predict

  • The estimator uses the remaining data to predict (in the form of sample paths) the “future” window that was just removed

  • The forecasts are returned, together with ground truth values for the same time range (as python generator objects)

from gluonts.evaluation import make_evaluation_predictions
forecast_it, ts_it = make_evaluation_predictions(
    dataset=dataset.test,
    predictor=predictor,
)

First, we can convert these generators to lists to ease the subsequent computations.

forecasts = list(forecast_it)
tss = list(ts_it)

Forecast objects have a plot method that can summarize the forecast paths as the mean, prediction intervals, etc. The prediction intervals are shaded in different colors as a “fan chart”.

def plot_prob_forecasts(ts_entry, forecast_entry):
    plot_length = 150
    prediction_intervals = (50.0, 90.0)
    legend = ["observations", "median prediction"] + [f"{k}% prediction interval" for k in prediction_intervals][::-1]

    fig, ax = plt.subplots(1, 1, figsize=(10, 7))
    ts_entry[-plot_length:].plot(ax=ax)  # plot the time series
    forecast_entry.plot(prediction_intervals=prediction_intervals, color='g')
    plt.grid(which="both")
    plt.legend(legend, loc="upper left")
    plt.show()
plot_prob_forecasts(tss[0], forecasts[0])

We can also evaluate the quality of our forecasts numerically. In GluonTS, the Evaluator class can compute aggregate performance metrics, as well as metrics per time series (which can be useful for analyzing performance across heterogeneous time series).

from gluonts.evaluation import Evaluator
evaluator = Evaluator(quantiles=[0.1, 0.5, 0.9])
agg_metrics, item_metrics = evaluator(tss, forecasts)

Aggregate metrics aggregate both across time-steps and across time series.

print(json.dumps(agg_metrics, indent=4))

Individual metrics are aggregated only across time-steps.

item_metrics.head()
item_metrics.plot(x='sMAPE', y='MASE', kind='scatter')
plt.grid(which="both")
plt.show()