gluonts.dataset.split.splitter module

Train/test splitter

This module defines strategies to split a whole dataset into train and test subsets.

For uniform datasets, where all time-series start and end at the same point in time OffsetSplitter can be used:

splitter = OffsetSplitter(prediction_length=24, split_offset=24)
train, test = splitter.split(whole_dataset)

For all other datasets, the more flexible DateSplitter can be used:

splitter = DateSplitter(
    prediction_length=24,
    split_date=pd.Timestamp('2018-01-31', freq='D')
)
train, test = splitter.split(whole_dataset)

The module also supports rolling splits:

splitter = DateSplitter(
    prediction_length=24,
    split_date=pd.Timestamp('2018-01-31', freq='D')
)
train, test = splitter.rolling_split(whole_dataset, windows=7)
class gluonts.dataset.split.splitter.AbstractBaseSplitter[source]

Bases: abc.ABC

Base class for all other splitter.

Parameters
  • prediction_length (param) – The prediction length which is used to train themodel.

  • max_history – If given, all entries in the test-set have a max-length of max_history. This can be sued to produce smaller file-sizes.

rolling_split(items: List[gluonts.dataset.common.TimeSeriesItem], windows: int, distance: Optional[int] = None) → gluonts.dataset.split.splitter.TrainTestSplit[source]
split(items: List[gluonts.dataset.common.TimeSeriesItem]) → gluonts.dataset.split.splitter.TrainTestSplit[source]
class gluonts.dataset.split.splitter.DateSplitter[source]

Bases: gluonts.dataset.split.splitter.AbstractBaseSplitter, pydantic.main.BaseModel

max_history: Optional[int] = None
prediction_length: int = None
split_date: pd.Timestamp = None
class gluonts.dataset.split.splitter.OffsetSplitter[source]

Bases: pydantic.main.BaseModel, gluonts.dataset.split.splitter.AbstractBaseSplitter

Requires uniform data.

max_history: Optional[int] = None
prediction_length: int = None
split_offset: int = None
class gluonts.dataset.split.splitter.TimeSeriesSlice[source]

Bases: pydantic.main.BaseModel

Like TimeSeriesItem, but all time-related fields are of type pd.Series and is indexable, e.g ts_slice[‘2018’:].

class Config[source]

Bases: object

arbitrary_types_allowed = True
property end
feat_dynamic_cat: List[pd.Series] = None
feat_dynamic_real: List[pd.Series] = None
feat_static_cat: List[int] = None
feat_static_real: List[float] = None
classmethod from_time_series_item(item: gluonts.dataset.common.TimeSeriesItem, freq: Optional[str] = None) → gluonts.dataset.split.splitter.TimeSeriesSlice[source]
item: str = None
property start
target: pd.Series = None
to_time_series_item() → gluonts.dataset.common.TimeSeriesItem[source]
class gluonts.dataset.split.splitter.TrainTestSplit[source]

Bases: pydantic.main.BaseModel

test: List[TimeSeriesItem] = None
train: List[TimeSeriesItem] = None