gluonts.dataset.pandas module#

class gluonts.dataset.pandas.PandasDataset(dataframes: typing.Union[pandas.core.frame.DataFrame, pandas.core.series.Series, typing.Iterable[pandas.core.frame.DataFrame], typing.Iterable[pandas.core.series.Series], typing.Iterable[typing.Tuple[typing.Any, pandas.core.frame.DataFrame]], typing.Iterable[typing.Tuple[typing.Any, pandas.core.series.Series]], typing.Dict[str, pandas.core.frame.DataFrame], typing.Dict[str, pandas.core.series.Series]], target: typing.Union[str, typing.List[str]] = 'target', timestamp: typing.Optional[str] = None, freq: typing.Optional[str] = None, feat_dynamic_real: typing.List[str] = <factory>, feat_dynamic_cat: typing.List[str] = <factory>, feat_static_real: typing.List[str] = <factory>, feat_static_cat: typing.List[str] = <factory>, past_feat_dynamic_real: typing.List[str] = <factory>, ignore_last_n_targets: int = 0, unchecked: bool = False, assume_sorted: bool = False)[source]#

Bases: object

A pandas.DataFrame-based dataset type.

This class is constructed with a collection of pandas.DataFrame-objects where each DataFrame is representing one time series. A target and a timestamp columns are essential. Furthermore, static/dynamic real/categorical features can be specified.

Parameters
  • dataframes (Union[pandas.core.frame.DataFrame, pandas.core.series.Series, Iterable[pandas.core.frame.DataFrame], Iterable[pandas.core.series.Series], Iterable[Tuple[Any, pandas.core.frame.DataFrame]], Iterable[Tuple[Any, pandas.core.series.Series]], Dict[str, pandas.core.frame.DataFrame], Dict[str, pandas.core.series.Series]]) – Single pd.DataFrame/pd.Series or a collection as list or dict containing at least timestamp and target values. If a Dict is provided, the key will be the associated item_id.

  • target (Union[str, List[str]]) – Name of the column that contains the target time series. For multivariate targets, a list of column names should be provided.

  • timestamp (Optional[str]) – Name of the column that contains the timestamp information.

  • freq (Optional[str]) – Frequency of observations in the time series. Must be a valid pandas frequency.

  • feat_dynamic_real (List[str]) – List of column names that contain dynamic real features.

  • feat_dynamic_cat (List[str]) – List of column names that contain dynamic categorical features.

  • feat_static_real (List[str]) – List of column names that contain static real features.

  • feat_static_cat (List[str]) – List of column names that contain static categorical features.

  • past_feat_dynamic_real (List[str]) – List of column names that contain dynamic real features only for the history.

  • ignore_last_n_targets (int) – For target and past dynamic features last ignore_last_n_targets elements are removed when iterating over the data set. This becomes important when the predictor is called.

  • unchecked (bool) – Whether consistency checks on indexes should be skipped. (Default: False)

  • assume_sorted (bool) – Whether to assume that indexes are sorted by time, and skip sorting. (Default: False)

assume_sorted: bool = False#
dataframes: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, Iterable[pandas.core.frame.DataFrame], Iterable[pandas.core.series.Series], Iterable[Tuple[Any, pandas.core.frame.DataFrame]], Iterable[Tuple[Any, pandas.core.series.Series]], Dict[str, pandas.core.frame.DataFrame], Dict[str, pandas.core.series.Series]]#
feat_dynamic_cat: List[str]#
feat_dynamic_real: List[str]#
feat_static_cat: List[str]#
feat_static_real: List[str]#
freq: Optional[str] = None#
classmethod from_long_dataframe(dataframe: pandas.core.frame.DataFrame, item_id: str, timestamp: Optional[str] = None, **kwargs) gluonts.dataset.pandas.PandasDataset[source]#

Construct PandasDataset out of a long dataframe. A long dataframe uses the long format for each variable. Target time series values, for example, are stacked on top of each other rather than side-by-side. The same is true for other dynamic or categorical features.

Parameters
  • dataframe – pandas.DataFrame containing at least timestamp, target and item_id columns.

  • item_id – Name of the column that, when grouped by, gives the different time series.

  • **kwargs – Additional arguments. Same as of PandasDataset class.

Returns

Gluonts dataset based on ``pandas.DataFrame``s.

Return type

PandasDataset

ignore_last_n_targets: int = 0#
past_feat_dynamic_real: List[str]#
target: Union[str, List[str]] = 'target'#
timestamp: Optional[str] = None#
unchecked: bool = False#
gluonts.dataset.pandas.as_dataentry(data: pandas.core.frame.DataFrame, target: Union[str, List[str]], timestamp: Optional[str] = None, feat_dynamic_real: List[str] = [], feat_dynamic_cat: List[str] = [], feat_static_real: List[str] = [], feat_static_cat: List[str] = [], past_feat_dynamic_real: List[str] = []) Dict[str, Any][source]#

Convert a single time series (uni- or multi-variate) that is given in a pandas.DataFrame format to a DataEntry.

Parameters
  • data – pandas.DataFrame containing at least timestamp, target and item_id columns.

  • target – Name of the column that contains the target time series. For multivariate targets target is expecting a list of column names.

  • timestamp – Name of the column that contains the timestamp information. If None the index of data is assumed to be the time.

  • feat_dynamic_real – List of column names that contain dynamic real features.

  • feat_dynamic_cat – List of column names that contain dynamic categorical features.

  • feat_static_real – List of column names that contain static real features.

  • feat_static_cat – List of column names that contain static categorical features.

  • past_feat_dynamic_real – List of column names that contain dynamic real features only for the history.

Returns

A dictionary with at least target and start field.

Return type

DataEntry

gluonts.dataset.pandas.extract_dynamic_array(df: pandas.core.frame.DataFrame, col_names: Union[str, Collection[str]]) numpy.ndarray[source]#
gluonts.dataset.pandas.extract_static_array(df: pandas.core.frame.DataFrame, col_names: Union[str, Collection[str]]) numpy.ndarray[source]#
gluonts.dataset.pandas.infer_freq(index: pandas.core.indexes.base.Index) str[source]#
gluonts.dataset.pandas.is_uniform(index: pandas.core.indexes.period.PeriodIndex) bool[source]#

Check if index contains monotonically increasing periods, evenly spaced with frequency index.freq.

>>> ts = ["2021-01-01 00:00", "2021-01-01 02:00", "2021-01-01 04:00"]
>>> is_uniform(pd.DatetimeIndex(ts).to_period("2H"))
True
>>> ts = ["2021-01-01 00:00", "2021-01-01 04:00"]
>>> is_uniform(pd.DatetimeIndex(ts).to_period("2H"))
False
gluonts.dataset.pandas.pair_with_item_id(obj: Union[Tuple, pandas.core.frame.DataFrame, pandas.core.series.Series])[source]#
gluonts.dataset.pandas.prepare_prediction_data(dataentry: Dict[str, Any], ignore_last_n_targets: int) Dict[str, Any][source]#

Remove ignore_last_n_targets values from target and past_feat_dynamic_real. Works in univariate and multivariate case.

>>> prepare_prediction_data(
>>>    {"target": np.array([1., 2., 3., 4.])}, ignore_last_n_targets=2
>>> )
{'target': array([1., 2.])}