Metadata-Version: 2.1
Name: apache-gravitino
Version: 0.9.0rc1
Summary: Python lib/client for Apache Gravitino
Home-page: https://github.com/apache/gravitino
Author: Apache Software Foundation
Author-email: dev@gravitino.apache.org
Maintainer: Apache Gravitino Community
Maintainer-email: dev@gravitino.apache.org
License: Apache-2.0
Project-URL: Homepage, https://gravitino.apache.org/
Project-URL: Source Code, https://github.com/apache/gravitino
Project-URL: Documentation, https://gravitino.apache.org/docs/overview
Project-URL: Bug Tracker, https://github.com/apache/gravitino/issues
Project-URL: Slack Chat, https://the-asf.slack.com/archives/C078RESTT19
Keywords: Data,AI,metadata,catalog
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: NOTICE
Requires-Dist: requests==2.32.3
Requires-Dist: dataclasses-json==0.6.7
Requires-Dist: readerwriterlock==1.0.9
Requires-Dist: fsspec==2024.3.1
Requires-Dist: pyarrow==15.0.2
Requires-Dist: cachetools==5.5.2
Requires-Dist: gcsfs==2024.3.1
Requires-Dist: s3fs==2024.3.1
Requires-Dist: ossfs==2023.12.0
Requires-Dist: adlfs==2023.12.0
Provides-Extra: dev
Requires-Dist: requests==2.32.3; extra == "dev"
Requires-Dist: dataclasses-json==0.6.7; extra == "dev"
Requires-Dist: pylint==3.2.2; extra == "dev"
Requires-Dist: black==24.4.2; extra == "dev"
Requires-Dist: twine==5.1.1; extra == "dev"
Requires-Dist: coverage==7.5.1; extra == "dev"
Requires-Dist: pandas==2.0.3; python_version == "3.8" and extra == "dev"
Requires-Dist: pandas==2.2.3; python_version > "3.8" and extra == "dev"
Requires-Dist: pyarrow==15.0.2; extra == "dev"
Requires-Dist: llama-index==0.11.18; extra == "dev"
Requires-Dist: tenacity==8.3.0; extra == "dev"
Requires-Dist: cachetools==5.5.2; extra == "dev"
Requires-Dist: readerwriterlock==1.0.9; extra == "dev"
Requires-Dist: docker==7.1.0; extra == "dev"
Requires-Dist: pyjwt[crypto]==2.8.0; extra == "dev"
Requires-Dist: jwcrypto==1.5.6; extra == "dev"
Requires-Dist: sphinx==7.1.2; extra == "dev"
Requires-Dist: furo==2024.8.6; extra == "dev"

# Apache Gravitino Python client

Apache Gravitino is a high-performance, geo-distributed, and federated metadata lake.
It manages the metadata directly in different sources, types, and regions, also provides users
the unified metadata access for data and AI assets.

Gravitino Python client helps data scientists easily manage metadata using Python language.

![gravitino-python-client-introduction](https://github.com/apache/gravitino/blob/main/docs/assets/gravitino-python-client-introduction.png?raw=true)

## Use Guidance

You can use Gravitino Python client library with Spark, PyTorch, Tensorflow, Ray and Python environment.

First of all, You must have a Gravitino server set up and run, You can refer document of
[How to install Gravitino](https://gravitino.apache.org/docs/latest/how-to-install/) to build Gravitino server from source code and
install it in your local.

### Apache Gravitino Python client API

```shell
pip install apache-gravitino
```

1. [Manage metalake using Gravitino Python API](https://gravitino.apache.org/docs/latest/manage-metalake-using-gravitino/?language=python)
2. [Manage fileset metadata using Gravitino Python API](https://gravitino.apache.org/docs/latest/manage-fileset-metadata-using-gravitino/?language=python)

### Apache Gravitino Fileset Example

We offer a playground environment to help you quickly understand how to use Gravitino Python
client to manage non-tabular data on HDFS via Fileset in Gravitino. You can refer to the
document [How to use the playground](https://gravitino.apache.org/docs/latest/how-to-use-the-playground/)
to launch a Gravitino server, HDFS and Jupyter notebook environment in you local Docker environment.

Waiting for the playground Docker environment to start, you can directly open
`http://localhost:18888/lab/tree/gravitino-fileset-example.ipynb` in the browser and run the example.

The [gravitino-fileset-example](https://github.com/apache/gravitino-playground/blob/main/init/jupyter/gravitino-fileset-example.ipynb)
contains the following code snippets:

1. Install HDFS Python client.
2. Create a HDFS client to connect HDFS and to do some test operations.
3. Install Gravitino Python client.
4. Initialize Gravitino admin client and create a Gravitino metalake.
5. Initialize Gravitino client and list metalakes.
6. Create a Gravitino `Catalog` and special `type` is `Catalog.Type.FILESET` and `provider` is
   [hadoop](https://gravitino.apache.org/docs/latest/hadoop-catalog/)
7. Create a Gravitino `Schema` with the `location` pointed to a HDFS path, and use `hdfs client` to
   check if the schema location is successfully created in HDFS.
8. Create a `Fileset` with `type` is [Fileset.Type.MANAGED](https://gravitino.apache.org/docs/latest/manage-fileset-metadata-using-gravitino/#fileset-operations),
   use `hdfs client` to check if the fileset location was successfully created in HDFS.
9. Drop this `Fileset.Type.MANAGED` type fileset and check if the fileset location was
   successfully deleted in HDFS.
10. Create a `Fileset` with `type` is [Fileset.Type.EXTERNAL](https://gravitino.apache.org/docs/latest/manage-fileset-metadata-using-gravitino/#fileset-operations)
    and `location` pointed to exist HDFS path
11. Drop this `Fileset.Type.EXTERNAL` type fileset and check if the fileset location was
    not deleted in HDFS.

## How to development Apache Gravitino Python Client

You can ues any IDE to develop Gravitino Python Client. Directly open the client-python module project in the IDE.

### Prerequisites

+ Python 3.8+
+ Refer to [How to build Gravitino](https://gravitino.apache.org/docs/latest/how-to-build/#prerequisites) to have necessary build
  environment ready for building.

### Build and testing

1. Clone the Gravitino project.

    ```shell
    git clone git@github.com:apache/gravitino.git
    ```

2. Build the Gravitino Python client module

    ```shell
    ./gradlew :clients:client-python:build
    ```

3. Run unit tests

    ```shell
    ./gradlew :clients:client-python:test -PskipITs
    ```

4. Run integration tests

   Because Python client connects to Gravitino Server to run integration tests,
   So it runs `./gradlew compileDistribution -x test` command automatically to compile the
   Gravitino project in the `distribution` directory. When you run integration tests via Gradle
   command or IDE, Gravitino integration test framework (`integration_test_env.py`)
   will start and stop Gravitino server automatically.

    ```shell
    ./gradlew :clients:client-python:test
    ```

5. Distribute the Gravitino Python client module

    ```shell
    ./gradlew :clients:client-python:distribution
    ```

6. Deploy the Gravitino Python client to https://pypi.org/project/apache-gravitino/

    ```shell
    ./gradlew :clients:client-python:deploy
    ```

## Resources

+ Official website https://gravitino.apache.org/
+ Project home on GitHub: https://github.com/apache/gravitino/
+ Playground with Docker: https://github.com/apache/gravitino-playground
+ User documentation: https://gravitino.apache.org/docs/
+ Slack Community: [https://the-asf.slack.com#gravitino](https://the-asf.slack.com/archives/C078RESTT19)

## License

Gravitino is under the Apache License Version 2.0, See the [LICENSE](https://github.com/apache/gravitino/blob/main/LICENSE) for the details.

## ASF Incubator disclaimer

Apache Gravitino is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. 
Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, 
and decision making process have stabilized in a manner consistent with other successful ASF projects. 
While incubation status is not necessarily a reflection of the completeness or stability of the code, 
it does indicate that the project has yet to be fully endorsed by the ASF.
