gluonts.dataset.loader module#
- class gluonts.dataset.loader.Batch(*, batch_size: int)[source]#
Bases:
Transformation
,BaseModel
- batch_size: int#
- gluonts.dataset.loader.InferenceDataLoader(dataset: ~gluonts.dataset.Dataset, *, transform: ~gluonts.transform._base.Transformation = <gluonts.transform._base.Identity object>, batch_size: int, stack_fn: ~typing.Callable)[source]#
Construct an iterator of batches for inference purposes.
- Parameters:
dataset – Data to iterate over.
transform – Transformation to be lazily applied as data is being iterated. The transformation is applied in “inference mode” (
is_train=False
).batch_size – Number of entries to include in a batch.
stack_fn – Function to use to stack data entries into batches. This can be used to set a specific array type or computing device the arrays should end up onto (CPU, GPU).
- Returns:
An iterable sequence of batches.
- Return type:
Iterable[DataBatch]
- class gluonts.dataset.loader.Stack[source]#
Bases:
Transformation
,BaseModel
- gluonts.dataset.loader.TrainDataLoader(dataset: ~gluonts.dataset.Dataset, *, transform: ~gluonts.transform._base.Transformation = <gluonts.transform._base.Identity object>, batch_size: int, stack_fn: ~typing.Callable, num_batches_per_epoch: ~typing.Optional[int] = None, shuffle_buffer_length: ~typing.Optional[int] = None)[source]#
Construct an iterator of batches for training purposes.
This function wraps around
DataLoader
to offer training-specific behaviour and options, as follows:1. The provided dataset is iterated cyclically, so that one can go over it multiple times in a single epoch. 2. A transformation must be provided, that is lazily applied as the dataset is being iterated; this is useful e.g. to slice random instances of fixed length out of each time series in the dataset. 3. The resulting batches can be iterated in a pseudo-shuffled order.
The returned object is a stateful iterator, whose length is either
num_batches_per_epoch
(if notNone
) or infinite (otherwise).- Parameters:
dataset – Data to iterate over.
transform – Transformation to be lazily applied as data is being iterated. The transformation is applied in “training mode” (
is_train=True
).batch_size – Number of entries to include in a batch.
stack_fn – Function to use to stack data entries into batches. This can be used to set a specific array type or computing device the arrays should end up onto (CPU, GPU).
num_batches_per_epoch – Length of the iterator. If
None
, then the iterator is endless.shuffle_buffer_length – Size of the buffer used for shuffling. Default: None, in which case no shuffling occurs.
- Returns:
An iterator of batches.
- Return type:
Iterator[DataBatch]
- gluonts.dataset.loader.ValidationDataLoader(dataset: ~gluonts.dataset.Dataset, *, transform: ~gluonts.transform._base.Transformation = <gluonts.transform._base.Identity object>, batch_size: int, stack_fn: ~typing.Callable)[source]#
Construct an iterator of batches for validation purposes.
- Parameters:
dataset – Data to iterate over.
transform – Transformation to be lazily applied as data is being iterated. The transformation is applied in “training mode” (
is_train=True
).batch_size – Number of entries to include in a batch.
stack_fn – Function to use to stack data entries into batches. This can be used to set a specific array type or computing device the arrays should end up onto (CPU, GPU).
- Returns:
An iterable sequence of batches.
- Return type:
Iterable[DataBatch]
- gluonts.dataset.loader.as_stacked_batches(dataset: Dataset, *, batch_size: int, output_type: Optional[Callable] = None, num_batches_per_epoch: Optional[int] = None, shuffle_buffer_length: Optional[int] = None, field_names: Optional[list] = None)[source]#
Prepare data in batches to be passed to a network.
Input data is collected into batches of size
batch_size
and then columns are stacked on top of each other. In addition, the result is wrapped inoutput_type
if provided.If
num_batches_per_epoch
is provided, only those number of batches are effectively returned. This is especially useful for training when providing a cyclic dataset.To pseudo shuffle data,
shuffle_buffer_length
can be set to collect inputs into a buffer first, from which we then randomly sample.Setting
field_names
will only consider those columns in the input data and discard all other values.