gluonts.dataset.rolling_dataset module#
- class gluonts.dataset.rolling_dataset.NumSplitsStrategy(*, prediction_length: int, num_splits: int)[source]#
Bases:
pydantic.main.BaseModel
The NumSplitsStrategy splits a window into num_splits chunks of equal size.
- Parameters
prediction_length (int) – The prediction length of the Predictor that the dataset will be used with
num_splits (int) – The number of segments which the window should be split into
- get_windows(window)[source]#
This function splits a given window (array of target values) into smaller chunks based on the provided parameters of the parent class.
- Parameters
window – The window which should be split
- Return type
A generator yielding split versions of the window
- num_splits: int#
- prediction_length: int#
- class gluonts.dataset.rolling_dataset.StepStrategy(*, prediction_length: int, step_size: int = 1)[source]#
Bases:
pydantic.main.BaseModel
Removes datapoints equivalent to step_size for each iteration until amount of data left is less than prediction_length.
- Parameters
prediction_length (int) – The prediction length of the Predictor that the dataset will be used with
step_size (int) – The number of points to remove for each iteration.
- get_windows(window)[source]#
This function splits a given window (array of target values) into smaller chunks based on the provided parameters of the parent class.
- Parameters
window – The window which should be split
- Return type
A generator yielding split versions of the window
- prediction_length: int#
- step_size: int#
- gluonts.dataset.rolling_dataset.generate_rolling_dataset(dataset: gluonts.dataset.Dataset, strategy, start_time: pandas._libs.tslibs.period.Period, end_time: Optional[pandas._libs.tslibs.period.Period] = None) gluonts.dataset.Dataset [source]#
Returns an augmented version of the input dataset where each timeseries has been rolled upon based on the parameters supplied. Below follows an explanation and examples of how the different parameters can be used to generate differently rolled datasets.
The rolling happens on the data available in the provided window between the start_time and the end_time for each timeseries. If end_time is omitted, rolling happens on all datapoints from start_time until the end of the timeseries. The way the data is rolled is governed by the strategy used.
Below examples will be based on this one timeseries long dataset
>>> ds = [{ ... "target": np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]), ... "start": pd.Period('2000-1-1-01', freq='1H') ... }]
applying generate_rolling_dataset on this dataset like:
>>> rolled = generate_rolling_dataset( ... dataset=ds, ... strategy = StepStrategy(prediction_length=2), ... start_time = pd.Period('2000-1-1-06', '1H'), ... end_time = pd.Period('2000-1-1-10', '1H') ... )
Results in a new dataset as follows (only target values shown for brevity):
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[1, 2, 3, 4, 5, 6, 7, 8, 9]
[1, 2, 3, 4, 5, 6, 7, 8]
[1, 2, 3, 4, 5, 6, 7]
i.e. maximum amount of rolls possible between the end_time and start_time. The StepStrategy only cuts the last value of the target for as long as there is enough values after start_time to perform predictions on.
When no end time is provided the output is as below since all datapoints from start_time will be rolled over.
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[1, 2, 3, 4, 5, 6, 7, 8, 9]
[1, 2, 3, 4, 5, 6, 7, 8]
[1, 2, 3, 4, 5, 6, 7]
One can change the step_size of the strategy as below:
>>> strategy = StepStrategy(prediction_length=2, step_size=2)
This causes fewer values to be in the output which, when prediction_length matches step_size, ensures that each prediction will be done on unique/new data. Below is the output when the above strategy is used.
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[1, 2, 3, 4, 5, 6, 7, 8]
Not setting an end time and using the step_size=2 results in the below dataset.
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[1, 2, 3, 4, 5, 6, 7, 8, 9]
[1, 2, 3, 4, 5, 6, 7]
- Parameters
dataset – Dataset to generate the rolling forecasting datasets from
strategy – The strategy that is to be used when rolling
start_time – The start of the window where rolling forecasts should be applied
end_time – The end time of the window where rolling should be applied
- Returns
The augmented dataset
- Return type