gluonts.dataset.split module#
Train/test splitter#
This module defines strategies to split a whole dataset into train and test
subsets. The split() function can also be used to trigger their logic.
For uniform datasets, where all time series start and end at the same point in
time OffsetSplitter can be used:
splitter = OffsetSplitter(offset=7)
train, test_template = splitter.split(whole_dataset)
For all other datasets, the more flexible DateSplitter can be used:
splitter = DateSplitter(
date=pd.Period('2018-01-31', freq='D')
)
train, test_template = splitter.split(whole_dataset)
In the above examples, the train output is a regular Dataset that can
be used for training purposes; test_template can generate test instances
as follows:
test_dataset = test_template.generate_instances(
prediction_length=7,
windows=2,
)
The windows argument controls how many test windows to generate from each
entry in the original dataset. Each window will begin after the split point,
and so will not contain any training data. By default, windows are
non-overlapping, but this can be controlled with the distance optional
argument.
test_dataset = test_template.generate_instances(
prediction_length=7,
windows=2,
distance=3, # windows are three time steps apart from each other
)
- class gluonts.dataset.split.AbstractBaseSplitter[source]#
Bases:
abc.ABCBase class for all other splitter.
- generate_test_pairs(dataset: gluonts.dataset.Dataset, prediction_length: int, windows: int = 1, distance: Optional[int] = None, max_history: Optional[int] = None) Generator[Tuple[Dict[str, Any], Dict[str, Any]], None, None][source]#
- generate_training_entries(dataset: gluonts.dataset.Dataset) Generator[Dict[str, Any], None, None][source]#
- split(dataset: gluonts.dataset.Dataset) Tuple[gluonts.dataset.split.TrainingDataset, gluonts.dataset.split.TestTemplate][source]#
- class gluonts.dataset.split.DateSplitter(date: pandas._libs.tslibs.period.Period)[source]#
Bases:
gluonts.dataset.split.AbstractBaseSplitterA splitter that slices training and test data based on a
pandas.Period.Training entries obtained from this class will be limited to observations up to (including) the given
date.- Parameters
date (pandas._libs.tslibs.period.Period) –
pandas.Perioddetermining where the training data ends.
- date: pandas._libs.tslibs.period.Period#
- class gluonts.dataset.split.InputDataset(test_data: gluonts.dataset.split.TestData)[source]#
Bases:
object- test_data: gluonts.dataset.split.TestData#
- class gluonts.dataset.split.LabelDataset(test_data: gluonts.dataset.split.TestData)[source]#
Bases:
object- test_data: gluonts.dataset.split.TestData#
- class gluonts.dataset.split.OffsetSplitter(offset: int)[source]#
Bases:
gluonts.dataset.split.AbstractBaseSplitterA splitter that slices training and test data based on a fixed integer offset.
- Parameters
offset (int) – Offset determining where the training data ends. A positive offset indicates how many observations since the start of each series should be in the training slice; a negative offset indicates how many observations before the end of each series should be excluded from the training slice.
- offset: int#
- class gluonts.dataset.split.TestData(dataset: gluonts.dataset.Dataset, splitter: gluonts.dataset.split.AbstractBaseSplitter, prediction_length: int, windows: int = 1, distance: Optional[int] = None, max_history: Optional[int] = None)[source]#
Bases:
objectAn iterable type used for wrapping test data.
Elements of a
TestDataobject are pairs(input, label), whereinputis input data for models, whilelabelis the future ground truth that models are supposed to predict.- Parameters
dataset (gluonts.dataset.Dataset) – Whole dataset used for testing.
splitter (gluonts.dataset.split.AbstractBaseSplitter) – A specific splitter that knows how to slices training and test data.
prediction_length (int) – Length of the prediction interval in test data.
windows (int) – Indicates how many test windows to generate for each original dataset entry.
distance (Optional[int]) – This is rather the difference between the start of each test window generated, for each of the original dataset entries.
max_history (Optional[int]) – If given, all entries in the test-set have a max-length of max_history. This can be used to produce smaller file-sizes.
- dataset: gluonts.dataset.Dataset#
- distance: Optional[int] = None#
- property input: gluonts.dataset.split.InputDataset#
- property label: gluonts.dataset.split.LabelDataset#
- max_history: Optional[int] = None#
- prediction_length: int#
- windows: int = 1#
- class gluonts.dataset.split.TestTemplate(dataset: gluonts.dataset.Dataset, splitter: gluonts.dataset.split.AbstractBaseSplitter)[source]#
Bases:
objectA class used for generating test data.
- Parameters
dataset (gluonts.dataset.Dataset) – Whole dataset used for testing.
splitter (gluonts.dataset.split.AbstractBaseSplitter) – A specific splitter that knows how to slices training and test data.
- dataset: gluonts.dataset.Dataset#
- generate_instances(prediction_length: int, windows: int = 1, distance: Optional[int] = None, max_history: Optional[int] = None) gluonts.dataset.split.TestData[source]#
Generate an iterator of test dataset, which includes input part and label part.
- Parameters
prediction_length – Length of the prediction interval in test data.
windows – Indicates how many test windows to generate for each original dataset entry.
distance – This is rather the difference between the start of each test window generated, for each of the original dataset entries.
max_history – If given, all entries in the test-set have a max-length of max_history. This can be used to produce smaller file-sizes.
- class gluonts.dataset.split.TimeSeriesSlice(entry: Dict[str, Any], prediction_length: int = 0)[source]#
Bases:
object- property end: pandas._libs.tslibs.period.Period#
- entry: Dict[str, Any]#
- prediction_length: int = 0#
- property start: pandas._libs.tslibs.period.Period#
- class gluonts.dataset.split.TrainingDataset(dataset: gluonts.dataset.Dataset, splitter: gluonts.dataset.split.AbstractBaseSplitter)[source]#
Bases:
object- dataset: gluonts.dataset.Dataset#
- gluonts.dataset.split.periods_between(start: pandas._libs.tslibs.period.Period, end: pandas._libs.tslibs.period.Period) int[source]#
Count how many periods fit between
startandend(inclusive). The frequency is taken fromstart.For example:
>>> start = pd.Period("2021-01-01 00", freq="2H") >>> end = pd.Period("2021-01-01 11", "2H") >>> periods_between(start, end) 6
>>> start = pd.Period("2021-03-03 23:00", freq="30T") >>> end = pd.Period("2021-03-04 03:29", freq="30T") >>> periods_between(start, end) 9
- gluonts.dataset.split.slice_data_entry(entry: Dict[str, Any], slice_: slice, prediction_length: int = 0) Dict[str, Any][source]#
- gluonts.dataset.split.split(dataset: gluonts.dataset.Dataset, *, offset: Optional[int] = None, date: Optional[pandas._libs.tslibs.period.Period] = None) Tuple[gluonts.dataset.split.TrainingDataset, gluonts.dataset.split.TestTemplate][source]#