gluonts.dataset.loader module#

class gluonts.dataset.loader.Batch(*, batch_size: int)[source]#

Bases: Transformation, BaseModel

batch_size: int#
gluonts.dataset.loader.InferenceDataLoader(dataset: ~gluonts.dataset.Dataset, *, transform: ~gluonts.transform._base.Transformation = <gluonts.transform._base.Identity object>, batch_size: int, stack_fn: ~typing.Callable)[source]#

Construct an iterator of batches for inference purposes.

Parameters:
  • dataset – Data to iterate over.

  • transform – Transformation to be lazily applied as data is being iterated. The transformation is applied in “inference mode” (is_train=False).

  • batch_size – Number of entries to include in a batch.

  • stack_fn – Function to use to stack data entries into batches. This can be used to set a specific array type or computing device the arrays should end up onto (CPU, GPU).

Returns:

An iterable sequence of batches.

Return type:

Iterable[DataBatch]

class gluonts.dataset.loader.Stack[source]#

Bases: Transformation, BaseModel

gluonts.dataset.loader.TrainDataLoader(dataset: ~gluonts.dataset.Dataset, *, transform: ~gluonts.transform._base.Transformation = <gluonts.transform._base.Identity object>, batch_size: int, stack_fn: ~typing.Callable, num_batches_per_epoch: ~typing.Optional[int] = None, shuffle_buffer_length: ~typing.Optional[int] = None)[source]#

Construct an iterator of batches for training purposes.

This function wraps around DataLoader to offer training-specific behaviour and options, as follows:

1. The provided dataset is iterated cyclically, so that one can go over it multiple times in a single epoch. 2. A transformation must be provided, that is lazily applied as the dataset is being iterated; this is useful e.g. to slice random instances of fixed length out of each time series in the dataset. 3. The resulting batches can be iterated in a pseudo-shuffled order.

The returned object is a stateful iterator, whose length is either num_batches_per_epoch (if not None) or infinite (otherwise).

Parameters:
  • dataset – Data to iterate over.

  • transform – Transformation to be lazily applied as data is being iterated. The transformation is applied in “training mode” (is_train=True).

  • batch_size – Number of entries to include in a batch.

  • stack_fn – Function to use to stack data entries into batches. This can be used to set a specific array type or computing device the arrays should end up onto (CPU, GPU).

  • num_batches_per_epoch – Length of the iterator. If None, then the iterator is endless.

  • shuffle_buffer_length – Size of the buffer used for shuffling. Default: None, in which case no shuffling occurs.

Returns:

An iterator of batches.

Return type:

Iterator[DataBatch]

gluonts.dataset.loader.ValidationDataLoader(dataset: ~gluonts.dataset.Dataset, *, transform: ~gluonts.transform._base.Transformation = <gluonts.transform._base.Identity object>, batch_size: int, stack_fn: ~typing.Callable)[source]#

Construct an iterator of batches for validation purposes.

Parameters:
  • dataset – Data to iterate over.

  • transform – Transformation to be lazily applied as data is being iterated. The transformation is applied in “training mode” (is_train=True).

  • batch_size – Number of entries to include in a batch.

  • stack_fn – Function to use to stack data entries into batches. This can be used to set a specific array type or computing device the arrays should end up onto (CPU, GPU).

Returns:

An iterable sequence of batches.

Return type:

Iterable[DataBatch]

gluonts.dataset.loader.as_stacked_batches(dataset: Dataset, *, batch_size: int, output_type: Optional[Callable] = None, num_batches_per_epoch: Optional[int] = None, shuffle_buffer_length: Optional[int] = None, field_names: Optional[list] = None)[source]#

Prepare data in batches to be passed to a network.

Input data is collected into batches of size batch_size and then columns are stacked on top of each other. In addition, the result is wrapped in output_type if provided.

If num_batches_per_epoch is provided, only those number of batches are effectively returned. This is especially useful for training when providing a cyclic dataset.

To pseudo shuffle data, shuffle_buffer_length can be set to collect inputs into a buffer first, from which we then randomly sample.

Setting field_names will only consider those columns in the input data and discard all other values.