gluonts.dataset.split.splitter module#

Train/test splitter#

This module defines strategies to split a whole dataset into train and test subsets.

For uniform datasets, where all time-series start and end at the same point in time OffsetSplitter can be used:

splitter = OffsetSplitter(prediction_length=24, split_offset=24)
train, test = splitter.split(whole_dataset)

For all other datasets, the more flexible DateSplitter can be used:

splitter = DateSplitter(
    prediction_length=24,
    split_date=pd.Period('2018-01-31', freq='D')
)
train, test = splitter.split(whole_dataset)

The module also supports rolling splits:

splitter = DateSplitter(
    prediction_length=24,
    split_date=pd.Period('2018-01-31', freq='D')
)
train, test = splitter.rolling_split(whole_dataset, windows=7)
class gluonts.dataset.split.splitter.AbstractBaseSplitter[source]#

Bases: abc.ABC

Base class for all other splitter.

Parameters
  • prediction_length (param) – The prediction length which is used to train themodel.

  • max_history – If given, all entries in the test-set have a max-length of max_history. This can be used to produce smaller file-sizes.

rolling_split(items: List[Dict[str, Any]], windows: int, distance: Optional[int] = None) gluonts.dataset.split.splitter.TrainTestSplit[source]#
split(items: List[Dict[str, Any]]) gluonts.dataset.split.splitter.TrainTestSplit[source]#
class gluonts.dataset.split.splitter.DateSplitter(*, prediction_length: int, split_date: pandas._libs.tslibs.period.Period, max_history: Optional[int] = None)[source]#

Bases: gluonts.dataset.split.splitter.AbstractBaseSplitter, pydantic.main.BaseModel

A splitter that slices training and test data based on a pandas.Period.

Training entries obtained from this class will be limited to observations up to (including) the given split_date.

Parameters
  • prediction_length (int) – Length of the prediction interval in test data.

  • split_date (pandas._libs.tslibs.period.Period) – Period determining where the training data ends. Please make sure at least prediction_length (for rolling_split multiple of prediction_length) values are left over after the split_date.

  • max_history (Optional[int]) – If given, all entries in the test-set have a max-length of max_history. This can be used to produce smaller file-sizes.

class Config[source]#

Bases: object

arbitrary_types_allowed = True#
max_history: Optional[int]#
prediction_length: int#
split_date: pandas._libs.tslibs.period.Period#
class gluonts.dataset.split.splitter.OffsetSplitter(*, prediction_length: int, split_offset: int, max_history: Optional[int] = None)[source]#

Bases: pydantic.main.BaseModel, gluonts.dataset.split.splitter.AbstractBaseSplitter

A splitter that slices training and test data based on a fixed integer offset.

Parameters
  • prediction_length (int) – Length of the prediction interval in test data.

  • split_offset (int) – Offset determining where the training data ends. A positive offset indicates how many observations since the start of each series should be in the training slice; a negative offset indicates how many observations before the end of each series should be excluded from the training slice. Please make sure that the number of excluded values is enough for the test case, i.e., at least prediction_length (for rolling_split multiple of prediction_length) values are left off.

  • max_history (Optional[int]) – If given, all entries in the test-set have a max-length of max_history. This can be used to produce smaller file-sizes.

max_history: Optional[int]#
prediction_length: int#
split_offset: int#
class gluonts.dataset.split.splitter.TimeSeriesSlice(*, target: pandas.core.series.Series, item: str, feat_static_cat: List[int] = [], feat_static_real: List[float] = [], feat_dynamic_cat: List[pandas.core.series.Series] = [], feat_dynamic_real: List[pandas.core.series.Series] = [])[source]#

Bases: pydantic.main.BaseModel

Like DataEntry, but all time-related fields are of type pd.Series and is indexable, e.g ts_slice[‘2018’:].

class Config[source]#

Bases: object

arbitrary_types_allowed = True#
property end: pandas._libs.tslibs.period.Period#
feat_dynamic_cat: List[pandas.core.series.Series]#
feat_dynamic_real: List[pandas.core.series.Series]#
feat_static_cat: List[int]#
feat_static_real: List[float]#
classmethod from_data_entry(item: Dict[str, Any], freq: Optional[str] = None) gluonts.dataset.split.splitter.TimeSeriesSlice[source]#
item: str#
property start: pandas._libs.tslibs.period.Period#
target: pandas.core.series.Series#
to_data_entry() Dict[str, Any][source]#
class gluonts.dataset.split.splitter.TrainTestSplit(*, train: List[Dict[str, Any]] = [], test: List[Dict[str, Any]] = [])[source]#

Bases: pydantic.main.BaseModel

test: List[Dict[str, Any]]#
train: List[Dict[str, Any]]#