gluonts.dataset.split.splitter module

Train/test splitter

This module defines strategies to split a whole dataset into train and test subsets.

For uniform datasets, where all time-series start and end at the same point in time OffsetSplitter can be used:

splitter = OffsetSplitter(prediction_length=24, split_offset=24)
train, test = splitter.split(whole_dataset)

For all other datasets, the more flexible DateSplitter can be used:

splitter = DateSplitter(
    prediction_length=24,
    split_date=pd.Timestamp('2018-01-31', freq='D')
)
train, test = splitter.split(whole_dataset)

The module also supports rolling splits:

splitter = DateSplitter(
    prediction_length=24,
    split_date=pd.Timestamp('2018-01-31', freq='D')
)
train, test = splitter.rolling_split(whole_dataset, windows=7)
class gluonts.dataset.split.splitter.AbstractBaseSplitter[source]

Bases: abc.ABC

Base class for all other splitter.

Parameters
  • prediction_length (param) – The prediction length which is used to train themodel.

  • max_history – If given, all entries in the test-set have a max-length of max_history. This can be used to produce smaller file-sizes.

rolling_split(items: List[Dict[str, Any]], windows: int, distance: Optional[int] = None) → gluonts.dataset.split.splitter.TrainTestSplit[source]
split(items: List[Dict[str, Any]]) → gluonts.dataset.split.splitter.TrainTestSplit[source]
class gluonts.dataset.split.splitter.DateSplitter[source]

Bases: gluonts.dataset.split.splitter.AbstractBaseSplitter, pydantic.main.BaseModel

max_history: Optional[int] = None
prediction_length: int = None
split_date: pd.Timestamp = None
class gluonts.dataset.split.splitter.OffsetSplitter[source]

Bases: pydantic.main.BaseModel, gluonts.dataset.split.splitter.AbstractBaseSplitter

Requires uniform data.

max_history: Optional[int] = None
prediction_length: int = None
split_offset: int = None
class gluonts.dataset.split.splitter.TimeSeriesSlice[source]

Bases: pydantic.main.BaseModel

Like DataEntry, but all time-related fields are of type pd.Series and is indexable, e.g ts_slice[‘2018’:].

class Config[source]

Bases: object

arbitrary_types_allowed = True
property end
feat_dynamic_cat: List[pd.Series] = None
feat_dynamic_real: List[pd.Series] = None
feat_static_cat: List[int] = None
feat_static_real: List[float] = None
classmethod from_data_entry(item: Dict[str, Any], freq: Optional[str] = None) → gluonts.dataset.split.splitter.TimeSeriesSlice[source]
item: str = None
property start
target: pd.Series = None
to_data_entry() → Dict[str, Any][source]
class gluonts.dataset.split.splitter.TrainTestSplit[source]

Bases: pydantic.main.BaseModel

test: List[DataEntry] = None
train: List[DataEntry] = None