gluonts.evaluation package

class gluonts.evaluation.Evaluator(quantiles: Iterable[Union[float, str]] = (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9), seasonality: Optional[int] = None, alpha: float = 0.05, calculate_owa: bool = False, custom_eval_fn: Optional[Dict] = None, num_workers: Optional[int] = 2, chunk_size: int = 32)[source]

Bases: object

Evaluator class, to compute accuracy metrics by comparing observations to forecasts.

  • quantiles – list of strings of the form ‘p10’ or floats in [0, 1] with the quantile levels

  • seasonality – seasonality to use for seasonal_error, if nothing is passed uses the default seasonality for the given series frequency as returned by get_seasonality

  • alpha – Parameter of the MSIS metric from the M4 competition that defines the confidence interval. For alpha=0.05 (default) the 95% considered is considered in the metric, see for more detail on MSIS

  • calculate_owa – Determines whether the OWA metric should also be calculated, which is computationally expensive to evaluate and thus slows down the evaluation process considerably. By default False.

  • custom_eval_fn – Option to include custom evaluation metrics. Expected input is a dictionary with keys specifying the name of the custom metric and the values are a list containing three elements. First, a callable which takes as input target and forecast and returns the evaluation metric. Second, a string specifying the aggregation metric across all time-series, f.e. “mean”, “sum”. Third, either “mean” or “median” to specify whether mean or median forecast should be passed to the custom evaluation function. E.g. {“RMSE”: [rmse, “mean”, “median”]}

  • num_workers – The number of multiprocessing workers that will be used to process the data in parallel. Default is multiprocessing.cpu_count(). Setting it to 0 or None means no multiprocessing.

  • chunk_size – Controls the approximate chunk size each workers handles at a time. Default is 32.

static abs_error(target: numpy.ndarray, forecast: numpy.ndarray) → float[source]
static abs_target_mean(target) → float[source]
static abs_target_sum(target) → float[source]
static coverage(target: numpy.ndarray, forecast: numpy.ndarray) → float[source]
default_quantiles = (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9)
static extract_past_data(time_series: Union[pandas.core.series.Series, pandas.core.frame.DataFrame], forecast: gluonts.model.forecast.Forecast) → numpy.ndarray[source]
  • time_series

  • forecast


time series without the forecast dates

Return type


static extract_pred_target(time_series: Union[pandas.core.series.Series, pandas.core.frame.DataFrame], forecast: gluonts.model.forecast.Forecast) → numpy.ndarray[source]
  • time_series

  • forecast


time series cut in the Forecast object dates

Return type


get_aggregate_metrics(metric_per_ts: pandas.core.frame.DataFrame) → Tuple[Dict[str, float], pandas.core.frame.DataFrame][source]
get_metrics_per_ts(time_series: Union[pandas.core.series.Series, pandas.core.frame.DataFrame], forecast: gluonts.model.forecast.Forecast) → Dict[str, Union[float, str, None]][source]
static mape(target: numpy.ndarray, forecast: numpy.ndarray, exclude_zero_denominator=True) → float[source]
\[mape = mean(|Y - Y_hat| / |Y|))\]
static mase(target: numpy.ndarray, forecast: numpy.ndarray, seasonal_error: float, exclude_zero_denominator=True) → float[source]
\[mase = mean(|Y - Y_hat|) / seasonal_error\]

static mse(target: numpy.ndarray, forecast: numpy.ndarray) → float[source]
static msis(target: numpy.ndarray, lower_quantile: numpy.ndarray, upper_quantile: numpy.ndarray, seasonal_error: float, alpha: float, exclude_zero_denominator=True) → float[source]

msis = mean(U - L + 2/alpha * (L-Y) * I[Y<L] + 2/alpha * (Y-U) * I[Y>U]) / seasonal_error

static owa(target: numpy.ndarray, forecast: numpy.ndarray, past_data: numpy.ndarray, seasonal_error: float, start_date: pandas._libs.tslibs.timestamps.Timestamp) → float[source]
\[owa = 0.5*(smape/smape_naive + mase/mase_naive)\]

static quantile_loss(target: numpy.ndarray, forecast: numpy.ndarray, q: float) → float[source]
seasonal_error(past_data: numpy.ndarray, forecast: gluonts.model.forecast.Forecast)[source]
\[seasonal_error = mean(|Y[t] - Y[t-m]|)\]

where m is the seasonal frequency

static smape(target: numpy.ndarray, forecast: numpy.ndarray, exclude_zero_denominator=True) → float[source]
\[smape = 2 * mean(|Y - Y_hat| / (|Y| + |Y_hat|))\]

class gluonts.evaluation.MultivariateEvaluator(quantiles: Iterable[Union[float, str]] = array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]), seasonality: Optional[int] = None, alpha: float = 0.05, eval_dims: List[int] = None, target_agg_funcs: Dict[str, Callable] = {}, custom_eval_fn: Optional[dict] = None, num_workers: Optional[int] = None)[source]

Bases: gluonts.evaluation._base.Evaluator

The MultivariateEvaluator class owns functionality for evaluating multidimensional target arrays of shape (target_dimensionality, prediction_length).

Evaluations of individual dimensions will be stored with the corresponding dimension prefix and contain the metrics calculated by only this dimension. Metrics with the plain metric name correspond to metrics calculated over all dimensions. Additionally, the user can provide additional aggregation functions that first aggregate the target and forecast over dimensions and then calculate the metric. These metrics will be prefixed with m_<aggregation_fun_name>_

The evaluation dimensions can be set by the user.


{‘0_MSE’: 0.004307240342677687, # MSE of dimension 0 ‘0_abs_error’: 1.6246897801756859, ‘1_MSE’: 0.003949341769475723, # MSE of dimension 1 ‘1_abs_error’: 1.5052175521850586, ‘MSE’: 0.004128291056076705, # MSE of all dimensions ‘abs_error’: 3.1299073323607445, ‘m_sum_MSE’: 0.02 # MSE of aggregated target and aggregated forecast (if target_agg_funcs is set). ‘m_sum_abs_error’: 4.2}

calculate_aggregate_multivariate_metrics(ts_iterator: Iterator[pandas.core.frame.DataFrame], forecast_iterator: Iterator[gluonts.model.forecast.Forecast], agg_fun: Callable) → Dict[str, float][source]
  • ts_iterator – Iterator over time series

  • forecast_iterator – Iterator over forecasts

  • agg_fun – aggregation function


dictionary with aggregate datasets metrics

Return type

Dict[str, float]

calculate_aggregate_vector_metrics(all_agg_metrics: Dict[str, float], all_metrics_per_ts: pandas.core.frame.DataFrame) → Dict[str, float][source]
  • all_agg_metrics – dictionary with aggregate metrics of individual dimensions

  • all_metrics_per_ts – DataFrame containing metrics for all time series of all evaluated dimensions


dictionary with aggregate metrics (of individual (evaluated) dimensions and the entire vector)

Return type

Dict[str, float]

static extract_aggregate_forecast(forecast_iterator: Iterator[gluonts.model.forecast.Forecast], agg_fun: Callable) → Iterator[gluonts.model.forecast.Forecast][source]
static extract_aggregate_target(it_iterator: Iterator[pandas.core.frame.DataFrame], agg_fun: Callable) → Iterator[pandas.core.frame.DataFrame][source]
static extract_forecast_by_dim(forecast_iterator: Iterator[gluonts.model.forecast.Forecast], dim: int) → Iterator[gluonts.model.forecast.Forecast][source]
static extract_target_by_dim(it_iterator: Iterator[pandas.core.frame.DataFrame], dim: int) → Iterator[pandas.core.frame.DataFrame][source]
get_eval_dims(target_dimensionality: int) → List[int][source]
static get_target_dimensionality(forecast: gluonts.model.forecast.Forecast) → int[source]
static peek(iterator: Iterator[Any]) → Tuple[Any, Iterator[Any]][source]
gluonts.evaluation.make_evaluation_predictions(dataset: Iterable[Dict[str, Any]], predictor: gluonts.model.predictor.Predictor, num_samples: int = 100) → Tuple[Iterator[gluonts.model.forecast.Forecast], Iterator[pandas.core.series.Series]][source]

Returns predictions for the trailing prediction_length observations of the given time series, using the given predictor.

The predictor will take as input the given time series without the trailing prediction_length observations.

  • dataset – Dataset where the evaluation will happen. Only the portion excluding the prediction_length portion is used when making prediction.

  • predictor – Model used to draw predictions.

  • num_samples – Number of samples to draw on the model when evaluating. Only sampling-based models will use this.


A pair of iterators, the first one yielding the forecasts, and the second one yielding the corresponding ground truth series.

Return type

Tuple[Iterator[Forecast], Iterator[pd.Series]]

gluonts.evaluation.backtest_metrics(test_dataset: Iterable[Dict[str, Any]], predictor: gluonts.model.predictor.Predictor, evaluator=<gluonts.evaluation._base.Evaluator object>, num_samples: int = 100, logging_file: Optional[str] = None) → Tuple[dict, pandas.core.frame.DataFrame][source]
  • test_dataset – Dataset to use for testing.

  • predictor – The predictor to test.

  • evaluator – Evaluator to use.

  • num_samples – Number of samples to use when generating sample-based forecasts. Only sampling-based models will use this.

  • logging_file – If specified, information of the backtest is redirected to this file.


A tuple of aggregate metrics and per-time-series metrics obtained by training forecaster on train_dataset and evaluating the resulting evaluator provided on the test_dataset.

Return type

Tuple[dict, pd.DataFrame]