Metadata-Version: 2.3
Name: padocc
Version: 1.4.5
Summary: Pipeline to Aggregate Data for Optimised Cloud Capabilities
License: BSD 3
Author: Daniel Westwood
Author-email: daniel.westwood@stfc.ac.uk
Requires-Python: >=3.11,<4.0
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: aiohttp (>=3.10.10,<4.0.0)
Requires-Dist: binpacking (>=1.5.2,<2.0.0)
Requires-Dist: cfapyx (>=2025.12.9)
Requires-Dist: cfgrib (==0.9.14.1)
Requires-Dist: dask (>=2025.11.0,<2026)
Requires-Dist: distributed (>=2025.11.0,<2026)
Requires-Dist: elasticsearch (>=8.0.0,<9.0.0)
Requires-Dist: fsspec (>=2025.7,<2026.0)
Requires-Dist: h5py (>=3.11.0,<4.0.0)
Requires-Dist: kerchunk (>=0.2.9,<0.3.0)
Requires-Dist: matplotlib (==3.9.2)
Requires-Dist: myst-nb (>=1.1.2,<2.0.0)
Requires-Dist: netcdf4 (>=1.7.2,<2.0.0)
Requires-Dist: numpy (<=3.0.0)
Requires-Dist: pytest (>=8.3.5,<9.0.0)
Requires-Dist: rechunker (==0.5.2)
Requires-Dist: requests (>=2.32.3,<3.0.0)
Requires-Dist: s3fs (>=2025.7.0,<2026.0.0)
Requires-Dist: scipy (>=1.12.0,<2.0.0)
Requires-Dist: tifffile (>=2024.9.20,<2025.0.0)
Requires-Dist: types-pyyaml (>=6.0.12.20240917,<7.0.0.0)
Requires-Dist: virtualizarr (>=2.0.1)
Requires-Dist: xarray (>=2024,<2026)
Requires-Dist: zarr (>=2.18.4)
Description-Content-Type: text/markdown

# PADOCC Package

[![PyPI version](https://badge.fury.io/py/padocc.svg)](https://pypi.python.org/pypi/padocc/)

Padocc (Pipeline to Aggregate Data for Optimal Cloud Capabilities) is a Data Aggregation pipeline for creating Kerchunk (or alternative) files to represent various datasets in different original formats.
Currently the Pipeline supports writing JSON/Parquet Kerchunk files for input NetCDF/HDF files. Further developments will allow GeoTiff, GRIB and possibly MetOffice (.pp) files to be represented, as well as using the Pangeo [Rechunker](https://rechunker.readthedocs.io/en/latest/) tool to create Zarr stores for Kerchunk-incompatible datasets.

[Example Notebooks at this link](https://mybinder.org/v2/gh/cedadev/padocc.git/main?filepath=showcase/notebooks)

[Documentation hosted at this link](https://cedadev.github.io/kerchunk-builder/)

![Kerchunk Pipeline](docs/source/_images/pipeline.png)

## Release 1.4.4

Release date: 22nd January 2026

See the ![release notes](https://github.com/cedadev/padocc/releases/tag/v1.4.4) for details.

This package acknowledges contributions by [Matt Brown](matbro@ceh.ac.uk) as a pre-release tester.

## Installation

To install this package, clone the repository using git clone, then follow the steps below to install the package with the necessary dependencies.

```
python -m venv .venv
source .venv/bin/activate
pip install poetry
poetry install
```

## Usage

Please refer to the documentation pages linked above for exact specifications on how to effectively use PADOCC.

