gluonts.dataset.split.splitter module#
Train/test splitter#
This module defines strategies to split a whole dataset into train and test subsets.
For uniform datasets, where all time-series start and end at the same point in time OffsetSplitter can be used:
splitter = OffsetSplitter(prediction_length=24, split_offset=24)
train, test = splitter.split(whole_dataset)
For all other datasets, the more flexible DateSplitter can be used:
splitter = DateSplitter(
prediction_length=24,
split_date=pd.Period('2018-01-31', freq='D')
)
train, test = splitter.split(whole_dataset)
The module also supports rolling splits:
splitter = DateSplitter(
prediction_length=24,
split_date=pd.Period('2018-01-31', freq='D')
)
train, test = splitter.rolling_split(whole_dataset, windows=7)
- class gluonts.dataset.split.splitter.AbstractBaseSplitter[source]#
Bases:
abc.ABC
Base class for all other splitter.
- Parameters
prediction_length (param) – The prediction length which is used to train themodel.
max_history – If given, all entries in the test-set have a max-length of max_history. This can be used to produce smaller file-sizes.
- rolling_split(items: List[Dict[str, Any]], windows: int, distance: Optional[int] = None) gluonts.dataset.split.splitter.TrainTestSplit [source]#
- split(items: List[Dict[str, Any]]) gluonts.dataset.split.splitter.TrainTestSplit [source]#
- class gluonts.dataset.split.splitter.DateSplitter(*, prediction_length: int, split_date: pandas._libs.tslibs.period.Period, max_history: Optional[int] = None)[source]#
Bases:
gluonts.dataset.split.splitter.AbstractBaseSplitter
,pydantic.main.BaseModel
A splitter that slices training and test data based on a
pandas.Period
.Training entries obtained from this class will be limited to observations up to (including) the given
split_date
.- Parameters
prediction_length (int) – Length of the prediction interval in test data.
split_date (pandas._libs.tslibs.period.Period) – Period determining where the training data ends. Please make sure at least
prediction_length
(forrolling_split
multiple ofprediction_length
) values are left over after thesplit_date
.max_history (Optional[int]) – If given, all entries in the test-set have a max-length of max_history. This can be used to produce smaller file-sizes.
- max_history: Optional[int]#
- prediction_length: int#
- split_date: pandas._libs.tslibs.period.Period#
- class gluonts.dataset.split.splitter.OffsetSplitter(*, prediction_length: int, split_offset: int, max_history: Optional[int] = None)[source]#
Bases:
pydantic.main.BaseModel
,gluonts.dataset.split.splitter.AbstractBaseSplitter
A splitter that slices training and test data based on a fixed integer offset.
- Parameters
prediction_length (int) – Length of the prediction interval in test data.
split_offset (int) – Offset determining where the training data ends. A positive offset indicates how many observations since the start of each series should be in the training slice; a negative offset indicates how many observations before the end of each series should be excluded from the training slice. Please make sure that the number of excluded values is enough for the test case, i.e., at least
prediction_length
(forrolling_split
multiple ofprediction_length
) values are left off.max_history (Optional[int]) – If given, all entries in the test-set have a max-length of max_history. This can be used to produce smaller file-sizes.
- max_history: Optional[int]#
- prediction_length: int#
- split_offset: int#
- class gluonts.dataset.split.splitter.TimeSeriesSlice(*, target: pandas.core.series.Series, item: str, feat_static_cat: List[int] = [], feat_static_real: List[float] = [], feat_dynamic_cat: List[pandas.core.series.Series] = [], feat_dynamic_real: List[pandas.core.series.Series] = [])[source]#
Bases:
pydantic.main.BaseModel
Like DataEntry, but all time-related fields are of type pd.Series and is indexable, e.g ts_slice[‘2018’:].
- property end: pandas._libs.tslibs.period.Period#
- feat_dynamic_cat: List[pandas.core.series.Series]#
- feat_dynamic_real: List[pandas.core.series.Series]#
- feat_static_cat: List[int]#
- feat_static_real: List[float]#
- classmethod from_data_entry(item: Dict[str, Any], freq: Optional[str] = None) gluonts.dataset.split.splitter.TimeSeriesSlice [source]#
- item: str#
- property start: pandas._libs.tslibs.period.Period#
- target: pandas.core.series.Series#