Metadata-Version: 2.4
Name: agora-etl-plugins
Version: 0.3.0
Summary: Official plugins for agora-etl — Kafka, PostgreSQL, Redis, cron scheduling, and distributed coordination.
Project-URL: Homepage, https://www.agora.my-working.com/
Project-URL: Documentation, https://www.agora.my-working.com/plugins/
Project-URL: Repository, https://github.com/thanhtham010891/agora-etl-plugins
Project-URL: BugTracker, https://github.com/thanhtham010891/agora-etl-plugins/issues
Author: Tham Tra
License: Apache-2.0
License-File: LICENSE
License-File: NOTICE
Keywords: async,cron,data-engineering,distributed,etl,kafka,pipeline,postgres,redis
Classifier: Development Status :: 3 - Alpha
Classifier: Framework :: AsyncIO
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.11
Requires-Dist: agora-etl<1,>=0.2.1
Provides-Extra: all
Requires-Dist: aiokafka<1,>=0.11; extra == 'all'
Requires-Dist: croniter<7,>=6.0; extra == 'all'
Requires-Dist: fastavro<2,>=1.9; extra == 'all'
Requires-Dist: psycopg[binary]<4,>=3.1; extra == 'all'
Requires-Dist: redis<8,>=7.0; extra == 'all'
Provides-Extra: cron
Requires-Dist: croniter<7,>=6.0; extra == 'cron'
Provides-Extra: dev
Requires-Dist: agora-etl[dev]; extra == 'dev'
Provides-Extra: distributed
Requires-Dist: redis<8,>=7.0; extra == 'distributed'
Provides-Extra: kafka
Requires-Dist: aiokafka<1,>=0.11; extra == 'kafka'
Requires-Dist: fastavro<2,>=1.9; extra == 'kafka'
Provides-Extra: postgres
Requires-Dist: psycopg[binary]<4,>=3.1; extra == 'postgres'
Provides-Extra: redis
Requires-Dist: redis<8,>=7.0; extra == 'redis'
Description-Content-Type: text/markdown

# Agora ETL Plugins

**Official plugin collection for [agora-etl](https://pypi.org/project/agora-etl/) — Redis, cron scheduling, distributed coordination, Kafka, and PostgreSQL.**

[![License](https://img.shields.io/badge/license-Apache%202.0-blue)](LICENSE)
![Python](https://img.shields.io/badge/python-3.11%2B-blue)
[![PyPI](https://img.shields.io/pypi/v/agora-etl-plugins)](https://pypi.org/project/agora-etl-plugins/)

---

## Overview

`agora-etl-plugins` extends [agora-etl](https://pypi.org/project/agora-etl/) with production-ready integrations. Plugins are auto-discovered via Python entry-points — install the package and they register themselves automatically, no manual wiring needed.

Canonical ecosystem docs live in the Agora docs site:

- Plugin ecosystem overview: `https://agora.my-working.com/plugins/`
- Core docs home: `https://agora.my-working.com/`

This README stays focused on package-specific quickstart information.

```python
from agora import DeliveryConfig, Pipeline
from agora_plugins.redis.sources import RedisStreamSource
from agora_plugins.redis.sinks import RedisSink

summary = await (
    Pipeline(RedisStreamSource(url="redis://localhost:6379", stream="events", group="my-group", consumer="worker-1"))
    .build(
        RedisSink(url="redis://localhost:6379", key_fn=lambda r: r["id"]),
        config=DeliveryConfig(batch_size=100),
    )
    .run()
)
print(f"written={summary.records_written}  errors={summary.records_errored}")
```

---

## Install

```bash
pip install "agora-etl-plugins[redis]"        # Redis source, sink, state, DLQ, dedup, AI cache
pip install "agora-etl-plugins[cron]"         # Cron schedule support for ScheduledPipeline
pip install "agora-etl-plugins[distributed]"  # Redis-backed distributed worker coordination
pip install "agora-etl-plugins[kafka]"        # Kafka source and sink
pip install "agora-etl-plugins[postgres]"     # PostgreSQL source, sink, DLQ, schema adapter
pip install "agora-etl-plugins[all]"          # Everything in one install
```

This package now targets `agora-etl>=0.2.1`.

For plugin sources such as Redis Streams, Kafka, and PostgreSQL, `agora-etl 0.2.1`
adds two core improvements worth adopting by default:

- `DeliveryConfig(batch_size=100)` improves throughput on the linear lane.
- `BatchMiddleware` now works correctly even when the source emits one record at a time.

If your pipelines checkpoint frequently, you can also enable the Rust checkpoint hot path
from the core package:

```bash
pip install "agora-etl[rs]" "agora-etl-plugins[redis]"
```

---

## Available plugins

### Redis `[redis]`

Full Redis integration — streaming ingestion, writes, dead-letter queue, state, deduplication, and LLM response caching.

| Component | Type | Description |
|---|---|---|
| `RedisStreamSource` | Source | Consume records from a Redis Stream via XREADGROUP |
| `RedisSink` | Sink | Write records to Redis (SET / LPUSH / RPUSH / XADD) |
| `RedisDLQSink` | Sink | Route failed records to a Redis-backed dead-letter queue |
| `RedisDLQSource` | Source | Replay failed records from the Redis DLQ |
| `RedisBackend` | State | Redis-backed state backend with TTL and membership support |
| `RedisStore` | Dedup | Exact-match deduplication via Redis SET NX |
| `RedisEmbeddingStore` | Dedup | Semantic deduplication using cosine similarity (up to ~10k entries) |
| `RedisLLMCache` | AI Cache | Distributed LLM response cache backed by Redis |

---

## License

Apache 2.0 — see [LICENSE](LICENSE).
