gluonts.evaluation#

class gluonts.evaluation.Evaluator(quantiles: Iterable[Union[float, str]] = default_quantiles, seasonality: Optional[int] = None, alpha: float = 0.05, calculate_owa: bool = False, custom_eval_fn: Optional[Dict] = None, num_workers: Optional[int] = multiprocessing.cpu_count(), chunk_size: int = 32, aggregation_strategy: Callable = aggregate_no_nan, ignore_invalid_values: bool = True)#

Evaluator class, to compute accuracy metrics by comparing observations to forecasts.

Parameters

quantiles – list of strings of the form ‘p10’ or floats in [0, 1] with the quantile levels
seasonality – seasonality to use for seasonal_error, if nothing is passed uses the default seasonality for the given series frequency as returned by get_seasonality
alpha – Parameter of the MSIS metric from the M4 competition that defines the confidence interval. For alpha=0.05 (default) the 95% considered is considered in the metric, see https://www.m4.unic.ac.cy/wp-content/uploads/2018/03/M4-Competitors-Guide.pdf for more detail on MSIS
calculate_owa – Determines whether the OWA metric should also be calculated, which is computationally expensive to evaluate and thus slows down the evaluation process considerably. By default False.
custom_eval_fn – Option to include custom evaluation metrics. Expected input is a dictionary with keys specifying the name of the custom metric and the values are a list containing three elements. First, a callable which takes as input target and forecast and returns the evaluation metric. Second, a string specifying the aggregation metric across all time-series, f.e. “mean”, “sum”. Third, either “mean” or “median” to specify whether mean or median forecast should be passed to the custom evaluation function. E.g. {“RMSE”: [rmse, “mean”, “median”]}
num_workers – The number of multiprocessing workers that will be used to process the data in parallel. Default is multiprocessing.cpu_count(). Setting it to 0 or None means no multiprocessing.
chunk_size – Controls the approximate chunk size each workers handles at a time. Default is 32.
ignore_invalid_values – Ignore NaN and inf values in the timeseries when calculating metrics.
aggregation_strategy – Function for aggregating per timeseries metrics. Available options are: aggregate_valid | aggregate_all | aggregate_no_nan The default function is aggregate_no_nan.

default_quantiles = (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9)#

static extract_past_data(time_series: Union[pandas.core.series.Series, pandas.core.frame.DataFrame], forecast: gluonts.model.forecast.Forecast) → numpy.ndarray#

Parameters

time_series –
forecast –

Returns

time series without the forecast dates

Return type

np.ndarray

static extract_pred_target(time_series: Union[pandas.core.series.Series, pandas.core.frame.DataFrame], forecast: gluonts.model.forecast.Forecast) → numpy.ndarray#

Parameters

time_series –
forecast –

Returns

time series cut in the Forecast object dates

Return type

np.ndarray

get_aggregate_metrics(metric_per_ts: pandas.core.frame.DataFrame) → Tuple[Dict[str, float], pandas.core.frame.DataFrame]#

get_metrics_per_ts(time_series: Union[pandas.core.series.Series, pandas.core.frame.DataFrame], forecast: gluonts.model.forecast.Forecast) → Mapping[str, Union[float, str, None, numpy.ma.core.MaskedConstant]]#

class gluonts.evaluation.MultivariateEvaluator(quantiles: Iterable[Union[float, str]] = np.linspace(0.1, 0.9, 9), seasonality: Optional[int] = None, alpha: float = 0.05, eval_dims: Optional[List[int]] = None, target_agg_funcs: Dict[str, Callable] = {}, custom_eval_fn: Optional[dict] = None, num_workers: Optional[int] = None)#

The MultivariateEvaluator class owns functionality for evaluating multidimensional target arrays of shape (target_dimensionality, prediction_length).

Evaluations of individual dimensions will be stored with the corresponding dimension prefix and contain the metrics calculated by only this dimension. Metrics with the plain metric name correspond to metrics calculated over all dimensions. Additionally, the user can provide additional aggregation functions that first aggregate the target and forecast over dimensions and then calculate the metric. These metrics will be prefixed with m_<aggregation_fun_name>_

The evaluation dimensions can be set by the user.

Example

{‘0_MSE’: 0.004307240342677687, # MSE of dimension 0 ‘0_abs_error’: 1.6246897801756859, ‘1_MSE’: 0.003949341769475723, # MSE of dimension 1 ‘1_abs_error’: 1.5052175521850586, ‘MSE’: 0.004128291056076705, # MSE of all dimensions ‘abs_error’: 3.1299073323607445, ‘m_sum_MSE’: 0.02 # MSE of aggregated target and aggregated forecast (if target_agg_funcs is set). ‘m_sum_abs_error’: 4.2}

calculate_aggregate_multivariate_metrics(ts_iterator: Iterator[pandas.core.frame.DataFrame], forecast_iterator: Iterator[gluonts.model.forecast.Forecast], agg_fun: Callable) → Dict[str, float]#

Parameters

ts_iterator – Iterator over time series
forecast_iterator – Iterator over forecasts
agg_fun – aggregation function

Returns

dictionary with aggregate datasets metrics

Return type

Dict[str, float]

calculate_aggregate_vector_metrics(all_agg_metrics: Dict[str, float], all_metrics_per_ts: pandas.core.frame.DataFrame) → Dict[str, float]#

Parameters

all_agg_metrics – dictionary with aggregate metrics of individual dimensions
all_metrics_per_ts – DataFrame containing metrics for all time series of all evaluated dimensions

Returns

dictionary with aggregate metrics (of individual (evaluated) dimensions and the entire vector)

Return type

Dict[str, float]

static extract_aggregate_forecast(forecast_iterator: Iterator[gluonts.model.forecast.Forecast], agg_fun: Callable) → Iterator[gluonts.model.forecast.Forecast]#

static extract_aggregate_target(it_iterator: Iterator[pandas.core.frame.DataFrame], agg_fun: Callable) → Iterator[pandas.core.frame.DataFrame]#

static extract_forecast_by_dim(forecast_iterator: Iterator[gluonts.model.forecast.Forecast], dim: int) → Iterator[gluonts.model.forecast.Forecast]#

static extract_target_by_dim(it_iterator: Iterator[pandas.core.frame.DataFrame], dim: int) → Iterator[pandas.core.frame.DataFrame]#

get_eval_dims(target_dimensionality: int) → List[int]#

static get_target_dimensionality(forecast: gluonts.model.forecast.Forecast) → int#

static peek(iterator: Iterator[Any]) → Tuple[Any, Iterator[Any]]#

gluonts.evaluation.backtest_metrics(test_dataset: Dataset, predictor: gluonts.model.predictor.Predictor, evaluator=Evaluator(quantiles=(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9)), num_samples: int = 100, logging_file: Optional[str] = None) → Tuple[dict, pandas.core.frame.DataFrame]#

Parameters

test_dataset – Dataset to use for testing.
predictor – The predictor to test.
evaluator – Evaluator to use.
num_samples – Number of samples to use when generating sample-based forecasts. Only sampling-based models will use this.
logging_file – If specified, information of the backtest is redirected to this file.

Returns

A tuple of aggregate metrics and per-time-series metrics obtained by training forecaster on train_dataset and evaluating the resulting evaluator provided on the test_dataset.

Return type

Tuple[dict, pd.DataFrame]

gluonts.evaluation.make_evaluation_predictions(dataset: Dataset, predictor: gluonts.model.predictor.Predictor, num_samples: int = 100) → Tuple[Iterator[gluonts.model.forecast.Forecast], Iterator[pandas.core.series.Series]]#

Returns predictions for the trailing prediction_length observations of the given time series, using the given predictor.

The predictor will take as input the given time series without the trailing prediction_length observations.

Parameters

dataset – Dataset where the evaluation will happen. Only the portion excluding the prediction_length portion is used when making prediction.
predictor – Model used to draw predictions.
num_samples – Number of samples to draw on the model when evaluating. Only sampling-based models will use this.

Returns

A pair of iterators, the first one yielding the forecasts, and the second one yielding the corresponding ground truth series.

Return type

Tuple[Iterator[Forecast], Iterator[pd.Series]]