Metadata-Version: 2.4
Name: ankaflow
Version: 0.1.0
Summary: AnkaFlow pipeline runner for server and browser
Author: Madis Udam
License: MIT
Requires-Python: >=3.12
Description-Content-Type: text/markdown
Requires-Dist: duckdb==1.1.1
Requires-Dist: pandas==2.2.3
Requires-Dist: pyarrow==17.0.0
Requires-Dist: pyyaml==6.0.2
Requires-Dist: typing-extensions==4.13.0
Requires-Dist: arrow==1.2.1
Requires-Dist: sqlglot==25.24.4
Requires-Dist: pydantic==2.10.6
Requires-Dist: jinja2==3.1.2
Requires-Dist: pypika-fork==0.49.0
Requires-Dist: shortuuid==1.0.13
Requires-Dist: fsspec==2025.3.2
Requires-Dist: jmespath==1.0.1
Requires-Dist: psutil==7.0.0
Provides-Extra: server
Requires-Dist: boto3==1.36.4; extra == "server"
Requires-Dist: clickhouse-driver==0.2.9; extra == "server"
Requires-Dist: google-cloud-storage==2.11.0; extra == "server"
Requires-Dist: httpx==0.21.1; extra == "server"
Requires-Dist: deltalake==0.25.4; extra == "server"
Requires-Dist: google-cloud-bigquery==3.26.0; extra == "server"
Requires-Dist: google-cloud-bigquery-storage==2.13.1; extra == "server"
Provides-Extra: browser
Provides-Extra: dev
Requires-Dist: black>=25.1.0; extra == "dev"
Requires-Dist: ruff>=0.9.4; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: mkdocs>=1.6.1; extra == "dev"
Requires-Dist: mkdocs-exclude>=1.0.2; extra == "dev"
Requires-Dist: mkdocs-material>=9.6.12; extra == "dev"
Requires-Dist: mkdocstrings>=0.29.1; extra == "dev"
Requires-Dist: mkdocstrings-python>=1.16.10; extra == "dev"
Requires-Dist: pyodide-py>=0.27.4; extra == "dev"

# AnkaFlow

**Run your data pipelines in Python or the browser.**  
AnkaFlow is a YAML + SQL-powered data pipeline engine that works in local Python, JupyterLite, or fully in-browser via Pyodide.

## 🚀 Features

- Run pipelines using DuckDB with SQL and optional Python
- Supports Parquet, REST APIs, BigQuery, ClickHouse (server only)
- Browser-compatible: works in JupyterLite, GitHub Pages, VS Code Web and more

## 📦 Install

```bash
# Server
pip install ankaflow[server]

# Dev
pip install -e .[dev,server]
```

## 🛠 Usage

```bash

> ankaflow /path/to/stages.yaml
```

```python
from ankaflow import (
    ConnectionConfiguration,
    Stages,
    Flow,
)

connections = ConnectionConfiguration()

stages = Stages.load("path/to/stages.yaml")
flow = Flow(stages, connections)
flow.run()
```

## 🔁 What is `Stages`?

`Stages` is the object that holds your pipeline definition parsed from a YAML file.  
Each stage is one of: `tap`, `transform`, or `sink`.

### Example

```yaml
- name: Extract Data
  kind: tap
  connection:
    kind: Parquet
    locator: input.parquet

- name: Transform Data
  kind: transform
  query: SELECT * FROM "Extract Data" WHERE "amount" > 100

- name: Load Data
  kind: sink
  connection:
    kind: Parquet
    locator: output.parquet
```
