Metadata-Version: 2.4
Name: b2h5py
Version: 0.5.0
Summary: Transparent optimized reading of n-dimensional Blosc2 slices for h5py
Author-email: Blosc Development Team <blosc@blosc.org>
License: Copyright (c) 2023 The Blosc Development Team <blosc@blosc.org>
        
        Redistribution and use in source and binary forms, with or without
        modification, are permitted provided that the following conditions are met:
        
        1. Redistributions of source code must retain the above copyright notice, this
           list of conditions and the following disclaimer.
        
        2. Redistributions in binary form must reproduce the above copyright notice,
           this list of conditions and the following disclaimer in the documentation
           and/or other materials provided with the distribution.
        
        3. Neither the name of the copyright holder nor the names of its contributors
           may be used to endorse or promote products derived from this software
           without specific prior written permission.
        
        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
        AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
        IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
        DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
        FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
        DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
        SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
        CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
        OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
        OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
        
Project-URL: Homepage, https://github.com/Blosc/b2h5py
Project-URL: Issues, https://github.com/Blosc/b2h5py/issues
Keywords: h5py,HDF5,Blosc2
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: BSD License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.3
Description-Content-Type: text/x-rst
License-File: LICENSE
Requires-Dist: blosc2>=2.3.0
Requires-Dist: numpy
Requires-Dist: h5py>=3.0
Requires-Dist: hdf5plugin>=4.3.0
Provides-Extra: h5py-test
Requires-Dist: pytest; extra == "h5py-test"
Dynamic: license-file

b2h5py
======

``b2h5py`` provides h5py_ with transparent, automatic optimized reading of n-dimensional slices of Blosc2_-compressed datasets. This optimized slicing leverages direct chunk access (skipping the slow HDF5 filter pipeline) and 2-level partitioning into chunks and then smaller blocks (so that less data is actually decompressed).

.. _h5py: https://www.h5py.org/
.. _Blosc2: https://www.blosc.org/

Benchmarks of this technique show 2x-5x speed-ups compared with normal filter-based access. Comparable results are obtained with a similar technique in PyTables, see `Optimized Hyper-slicing in PyTables with Blosc2 NDim`_.

.. image:: doc/benchmark.png

.. _Optimized Hyper-slicing in PyTables with Blosc2 NDim: https://www.blosc.org/posts/pytables-b2nd-slicing/

Usage
-----

This optimized access works for slices with step 1 on Blosc2-compressed datasets using the native byte order. It is enabled by monkey-patching the ``h5py.Dataset`` class to extend the slicing operation. The easiest way to do this is::

    import b2h5py.auto

After that, optimization will be attempted for any slicing of a dataset (of the form ``dataset[...]`` or ``dataset.__getitem__(...)``). If the optimization is not possible in a particular case, normal h5py slicing code will be used (which performs HDF5 filter-based access, backed by hdf5plugin_ to support Blosc2).

.. _hdf5plugin: https://github.com/silx-kit/hdf5plugin

You may instead just ``import b2h5py`` and explicitly enable the optimization globally by calling ``b2h5py.enable_fast_slicing()``, and disable it again with ``b2h5py.disable_fast_slicing()``. You may also enable it temporarily by using a context manager::

    with b2h5py.fast_slicing():
        # ... code that will use Blosc2 optimized slicing ...

Finally, you may explicitly enable optimizations for a given h5py dataset by wrapping it in a ``B2Dataset`` instance::

    b2dset = b2h5py.B2Dataset(dset)
    # ... slicing ``b2dset`` will use Blosc2 optimization ...

Building
--------

Just install PyPA build (e.g. ``pip install build``), enter the source code directory and run ``pyproject-build`` to get a source tarball and a wheel under the ``dist`` directory.

Installing
----------

To install as a wheel from PyPI, run ``pip install b2h5py``.

You may also install the wheel that you built in the previous section, or enter the source code directory and run ``pip install .`` from there.

Running tests
-------------

If you have installed ``b2h5py``, just run ``python -m unittest discover b2h5py.tests``.

Otherwise, just enter its source code directory and run ``python -m unittest``.

You can also run the h5py tests with the patched ``Dataset`` class to check that patching does not break anything. You may install the ``h5py-test`` extra (e.g. ``pip install b2h5py[h5py-test]`` and run ``python -m b2h5py.tests.test_patched_h5py``.
