Metadata-Version: 2.1
Name: arcae
Version: 0.1.0
Summary: Example package description
Home-page: https://github.com/ratt-ru/arcae
Author: Simon Perkins
Author-email: simon.perkins@gmail.com
License: BSD 3-Clause License
Classifier: License :: OSI Approved :: BSD License
Classifier: Programming Language :: Python :: 3
Description-Content-Type: text/x-rst
License-File: LICENSE
Requires-Dist: appdirs
Requires-Dist: click
Requires-Dist: rich
Requires-Dist: pyarrow (==12.0.0)
Provides-Extra: dev
Requires-Dist: black (==22.1.0) ; extra == 'dev'
Requires-Dist: flake8 (==4.0.1) ; extra == 'dev'
Requires-Dist: tbump ; extra == 'dev'
Provides-Extra: test
Requires-Dist: duckdb ; extra == 'test'
Requires-Dist: pytest (>=7.0.0) ; extra == 'test'
Requires-Dist: python-casacore (>=3.5.0) ; extra == 'test'
Requires-Dist: requests ; extra == 'test'

C++ and Python Arrow Bindings for casacore
==========================================


Rationale
---------

* The structure of Apache Arrow Tables is highly similar to that of CASA Tables
* It's easy to convert Arrow Tables between many different languages
* Once in Apache Arrow format, it is easy to store data in modern, cloud-native disk formats such as parquet and orc.
* Converting CASA Tables to Arrow in the C++ layer avoids the GIL
* Access to non thread-safe CASA Tables is constrained to a ThreadPool containing a single thread
* It also allows us to write astrometric routines in C++, potentially side-stepping thread-safety
  and GIL issues with the CASA Measures server.


Build Wheel Locally
-------------------

In the user or, even better, a virtual environment:

.. code-block:: python

  $ pip install -U pip cibuildwheel
  $ bash scripts/run_cbuildwheel.sh -p 3.8

.. warning::
  Only linux wheels are currently supported.

Local Development
-----------------

In the directory containing the source, setup your development environment as follows:

.. code-block:: python

  $ pip install -U pip virtualenv
  $ virtualenv -p python3.8 /venv/arcaedev
  $ . /venv/arcaedev/bin/activate
  (arcaedev) export VCPKG_TARGET_TRIPLET=x64-linux-dynamic-cxx17-abi0-dbg  $ suffix to -rel for release
  (arcaedev) pip install -e .[test]
  (arcaedev) export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(pwd)/vcpkg/installed/$VCPKG_TARGET_TRIPLET/lib
  (arcaedev) py.test -s -vvv --pyargs arcae

Usage
-----

Example Usage:

  .. code-block:: python

    import json
    from pprint import pprint

    import arcae
    import pandas as pd
    import pyarrow as pa
    import pyarrow.parquet as pq

    # Obtain (partial) Apache Arrow Table from a CASA Table
    casa_table = arcae.table("/path/to/measurementset.ms")
    arrow_table = casa_table.to_arrow()        # read entire table
    arrow_table = casa_table.to_arrow(10, 20)  # startrow, nrow
    assert isinstance(arrow_table, pa.Table)

    # Print JSON-encoded Table and Column keywords
    pprint(json.loads(AT.schema.metadata[b"__arcae_metadata__"]))
    pprint(json.loads(AT.schema.field("DATA").metadata[b"__arcae_metadata__"]))

    # Extract Arrow Table columns into numpy arrays
    time = arrow_table.column("TIME").to_numpy()
    data = arrow_table.column("DATA").to_numpy()   # currently, arrays of object arrays, overly slow and memory hungry
    df = arrow_table.to_pandas()                   # currently slow, memory hungry due to arrays of object arrays

    # Write Arrow Table to parquet file
    pq.write_table(arrow_table, "measurementset.parquet")


See the test cases for further use cases.


Exporting Measurement Sets to Arrow Parquet Datasets
----------------------------------------------------

An export script is available:

.. code-block:: bash

  $ arcae export /path/to/the.ms --nrow 50000
  $ tree output.arrow/
  output.arrow/
  ├── ANTENNA
  │   └── data0.parquet
  ├── DATA_DESCRIPTION
  │   └── data0.parquet
  ├── FEED
  │   └── data0.parquet
  ├── FIELD
  │   └── data0.parquet
  ├── MAIN
  │   └── FIELD_ID=0
  │       └── PROCESSOR_ID=0
  │           ├── DATA_DESC_ID=0
  │           │   ├── data0.parquet
  │           │   ├── data1.parquet
  │           │   ├── data2.parquet
  │           │   └── data3.parquet
  │           ├── DATA_DESC_ID=1
  │           │   ├── data0.parquet
  │           │   ├── data1.parquet
  │           │   ├── data2.parquet
  │           │   └── data3.parquet
  │           ├── DATA_DESC_ID=2
  │           │   ├── data0.parquet
  │           │   ├── data1.parquet
  │           │   ├── data2.parquet
  │           │   └── data3.parquet
  │           └── DATA_DESC_ID=3
  │               ├── data0.parquet
  │               ├── data1.parquet
  │               ├── data2.parquet
  │               └── data3.parquet
  ├── OBSERVATION
  │   └── data0.parquet


This data can be loaded into an Arrow Dataset:

.. code-block:: python

    >>> import pyarrow as pa
    >>> import pyarrow.dataset as pad
    >>> main_ds = pad.dataset("output.arrow/MAIN")
    >>> spw_ds = pad.dataset("output.arrow/SPECTRAL_WINDOW")

Limitations
-----------

Some edge cases have not yet been implemented, but could be with some thought.

* Not yet able to handle columns with unconstrained rank (ndim == -1). Probably simplest to convert these rows to json and store as a string.
* Not yet able to handle TpRecord columns. Probably simplest to convert these rows to json and store as a string.
* Not yet able to handle TpQuantity columns. Possible to represent as a run-time parametric Arrow DataType.
* `to_numpy()` conversion of nested lists produces nested numpy arrays, instead of tensors.
  This is `possible <daskms_ext_types_>`_ but requires some changes to how
  `C++ Extension Types are exposed in Python <arrow_python_expose_cpp_ext_types_>`_.



Etymology
---------

Noun: **arca** f (genitive **arcae**); first declension
A chest, box, coffer, safe (safe place for storing items, or anything of a similar shape)

Pronounced: `ar-ki <arcae_pronounce_>`_.


.. _daskms_ext_types: https://github.com/ratt-ru/dask-ms/blob/1ff73ce3a60ea6479e40fc8cf440fd8d077e3d26/daskms/experimental/arrow/extension_types.py#L120-L152
.. _arrow_python_expose_cpp_ext_types: https://github.com/apache/arrow/issues/33997
.. _arcae_pronounce: https://translate.google.com/?sl=la&tl=en&text=arcae%0A&op=translate
