gluonts.dataset.loader module

class gluonts.dataset.loader.DataLoader(data_iterable: Iterable[Dict[str, Any]], *, batch_size: int, stack_fn: Callable, num_workers: Optional[int] = None, num_prefetch: Optional[int] = None, decode_fn: Callable = <function DataLoader.<lambda>>)[source]

Bases: collections.abc.Iterable, typing.Generic

Iterate a datasets and stack its entries into batches of a given size.

The object can be configured to use multiple processes to iterate the entries, which increases throughput in case the data entries are lazily transformed.

Parameters
  • data_iterable – Data to construct batches from.

  • batch_size – Number of entries to include in a batch.

  • stack_fn – Function to use to stack data entries into batches. This can be used to set a specific array type or computing device the arrays should end up onto (CPU, GPU).

  • num_workers – Number of worker processes to use. Default: None.

  • num_prefetch – Sets the length of the queue of batches being produced by worker processes. (Only meaningful when num_workers is not None).

  • decode_fn – A function called on each batch after it’s been taken out of the queue. (Only meaningful when num_workers is not None).

gluonts.dataset.loader.InferenceDataLoader(dataset: Iterable[Dict[str, Any]], *, transform: gluonts.transform._base.Transformation, batch_size: int, stack_fn: Callable)[source]

Construct an iterator of batches for inference purposes.

Parameters
  • dataset – Data to iterate over.

  • transform – Transformation to be lazily applied as data is being iterated. The transformation is applied in “inference mode” (is_train=False).

  • batch_size – Number of entries to include in a batch.

  • stack_fn – Function to use to stack data entries into batches. This can be used to set a specific array type or computing device the arrays should end up onto (CPU, GPU).

Returns

An iterable sequence of batches.

Return type

Iterable[DataBatch]

class gluonts.dataset.loader.MultiProcessBatcher(dataset: Iterable[Dict[str, Any]], batch_size: int, stack_fn: Callable, num_workers: int, max_queue_size: Optional[int] = None, decode_fn: Callable = <function MultiProcessBatcher.<lambda>>)[source]

Bases: collections.abc.Iterator, typing.Generic

static worker_fn(worker_id: int, num_workers: int, dataset, batch_size: int, stack_fn: Callable, batch_queue: multiprocessing.context.BaseContext.Queue, terminate_event, exhausted_event)[source]
gluonts.dataset.loader.TrainDataLoader(dataset: Iterable[Dict[str, Any]], *, transform: gluonts.transform._base.Transformation, batch_size: int, stack_fn: Callable, num_batches_per_epoch: Optional[int] = None, num_workers: Optional[int] = None, num_prefetch: Optional[int] = None, shuffle_buffer_length: Optional[int] = None, decode_fn: Callable = <function <lambda>>)[source]

Construct an iterator of batches for training purposes.

This function wraps around DataLoader to offer training-specific behaviour and options, as follows:

1. The provided dataset is iterated cyclically, so that one can go over it multiple times in a single epoch. 2. A transformation must be provided, that is lazily applied as the dataset is being iterated; this is useful e.g. to slice random instances of fixed length out of each time series in the dataset. 3. The resulting batches can be iterated in a pseudo-shuffled order.

The returned object is a stateful iterator, whose length is either num_batches_per_epoch (if not None) or infinite (otherwise).

Parameters
  • dataset – Data to iterate over.

  • transform – Transformation to be lazily applied as data is being iterated. The transformation is applied in “training mode” (is_train=True).

  • batch_size – Number of entries to include in a batch.

  • stack_fn – Function to use to stack data entries into batches. This can be used to set a specific array type or computing device the arrays should end up onto (CPU, GPU).

  • num_batches_per_epoch – Length of the iterator. If None, then the iterator is endless.

  • num_workers – Number of worker processes to use. Default: None.

  • num_prefetch – Sets the length of the queue of batches being produced by worker processes. (Only meaningful when num_workers is not None).

  • shuffle_buffer_length – Size of the buffer used for shuffling. Default: None, in which case no shuffling occurs.

  • decode_fn – A function called on each batch after it’s been taken out of the queue. (Only meaningful when num_workers is not None).

Returns

An iterator of batches.

Return type

Iterator[DataBatch]

gluonts.dataset.loader.ValidationDataLoader(dataset: Iterable[Dict[str, Any]], *, transform: gluonts.transform._base.Transformation, batch_size: int, stack_fn: Callable)[source]

Construct an iterator of batches for validation purposes.

Parameters
  • dataset – Data to iterate over.

  • transform – Transformation to be lazily applied as data is being iterated. The transformation is applied in “training mode” (is_train=True).

  • batch_size – Number of entries to include in a batch.

  • stack_fn – Function to use to stack data entries into batches. This can be used to set a specific array type or computing device the arrays should end up onto (CPU, GPU).

Returns

An iterable sequence of batches.

Return type

Iterable[DataBatch]

gluonts.dataset.loader.win32_guard(num_workers: Optional[int]) → Optional[int][source]