gluonts.dataset.arrow package#
Arrow Dataset#
Fast and efficient datasets using pyarrow.
This module provides three file-types:
ArrowFile
(arrow random-access binary format)
ArrowStreamFile
(arrow streaming binary format)
ParquetFile
- class gluonts.dataset.arrow.ArrowFile(path: pathlib.Path, _start: int = 0, _take: Optional[int] = None)[source]#
Bases:
File
- property batch_offsets#
- decoder: ArrowDecoder#
- path: Path#
- reader: RecordBatchFileReader#
- property schema#
- class gluonts.dataset.arrow.ArrowStreamFile(path: pathlib.Path, _start: int = 0, _take: Optional[int] = None)[source]#
Bases:
File
- path: Path#
- class gluonts.dataset.arrow.ArrowWriter(stream: bool = False, suffix: str = '.feather', compression: Union[Literal['lz4'], Literal['zstd'], NoneType] = None, flatten_arrays: bool = True, metadata: Optional[dict] = None)[source]#
Bases:
DatasetWriter
- compression: Optional[Union[Literal['lz4'], Literal['zstd']]] = None#
- flatten_arrays: bool = True#
- metadata: Optional[dict] = None#
- stream: bool = False#
- suffix: str = '.feather'#
- class gluonts.dataset.arrow.File[source]#
Bases:
object
- SUFFIXES = {'.arrow', '.feather', '.parquet'}#
- static infer(path: Path) Union[ArrowFile, ArrowStreamFile, ParquetFile] [source]#
Return ArrowFile, ArrowStreamFile or ParquetFile by inspecting provided path.
Arrow’s random-access format starts with ARROW1, so we peek the provided file for it.
- class gluonts.dataset.arrow.ParquetFile(path: pathlib.Path, _start: int = 0, _take: Optional[int] = None, _row_group_sizes: List[int] = <factory>)[source]#
Bases:
File
- path: Path#
- reader: ParquetFile#
- class gluonts.dataset.arrow.ParquetWriter(suffix: str = '.parquet', flatten_arrays: bool = True, metadata: Optional[dict] = None)[source]#
Bases:
DatasetWriter
- flatten_arrays: bool = True#
- metadata: Optional[dict] = None#
- suffix: str = '.parquet'#
- gluonts.dataset.arrow.write_dataset(Writer, dataset, path, metadata=None, batch_size=1024, flatten_arrays=True)[source]#