Metadata-Version: 2.4
Name: ava-protocol
Version: 0.1.3
Summary: AI Visibility Anonymizer - Privacy-preserving middleware for LLMs
Project-URL: Homepage, https://github.com/ava-protocol/ava-protocol
Project-URL: Documentation, https://ava-protocol.readthedocs.io
Project-URL: Repository, https://github.com/ava-protocol/ava-protocol
Project-URL: Bug Tracker, https://github.com/ava-protocol/ava-protocol/issues
Author-email: Gerald Enrique Nelson Mc Kenzie <lordxmen2k@gmail.com>
License: MIT
License-File: LICENSE
Keywords: ai,anonymization,data-protection,gdpr,hipaa,llm,pii,presidio,privacy,security
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing
Requires-Python: >=3.9
Requires-Dist: httpx>=0.25.0
Requires-Dist: pydantic>=2.0.0
Provides-Extra: all
Requires-Dist: boto3>=1.28.0; extra == 'all'
Requires-Dist: presidio-analyzer>=2.2.0; extra == 'all'
Requires-Dist: presidio-anonymizer>=2.2.0; extra == 'all'
Requires-Dist: spacy>=3.7.0; extra == 'all'
Provides-Extra: aws
Requires-Dist: boto3>=1.28.0; extra == 'aws'
Provides-Extra: dev
Requires-Dist: black>=23.0.0; extra == 'dev'
Requires-Dist: build>=1.0.0; extra == 'dev'
Requires-Dist: mypy>=1.5.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest>=7.4.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Requires-Dist: twine>=4.0.0; extra == 'dev'
Provides-Extra: local
Requires-Dist: presidio-analyzer>=2.2.0; extra == 'local'
Requires-Dist: presidio-anonymizer>=2.2.0; extra == 'local'
Requires-Dist: spacy>=3.7.0; extra == 'local'
Description-Content-Type: text/markdown

# AVA Protocol

**AI Visibility Anonymizer** - Privacy-preserving middleware for LLM interactions with reversible tokenization.

[![PyPI](https://img.shields.io/pypi/v/ava-protocol)](https://pypi.org/project/ava-protocol/)
[![Python](https://img.shields.io/badge/python-3.9%2B-blue)](https://www.python.org/)
[![License](https://img.shields.io/badge/license-MIT-green)](LICENSE)

---

## What is AVA?

AVA Protocol sanitizes sensitive data (PII/PHI) before it reaches AI systems, maintains cryptographically-signed audit trails, and enables faithful restoration of original values in AI outputs.

**Key Innovation:** Reversible tokenization preserves both privacy AND data utility.

```python
import ava

client = ava.Client(engine="presidio", policy="healthcare_strict")

with client.session(reversibility=True) as session:
    # Original: "Patient John Smith, SSN 123-45-6789"
    safe = session.sanitize(text)
    # Sanitized: "Patient AVA_PERS_xK9mP2nQ, SSN AVA_SSN_fG5hI6jK"

    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[{"role": "user", "content": safe}]
    )

    final = session.restore(response)  # Original values restored!
```

---

## Installation

```bash
# Gateway mode (lightweight, ~50KB)
pip install ava-protocol

# Embedded mode (local ML, ~500MB)
pip install ava-protocol[local]

# Cloud integrations
pip install ava-protocol[aws]      # AWS Macie
pip install ava-protocol[azure]    # Azure PII
pip install ava-protocol[gcp]      # Google DLP
pip install ava-protocol[all]      # Everything
```

---

## Quick Start

### 1. Gateway Mode (Recommended for Most Users)

Connect to a remote AVA Gateway server:

```python
import ava

client = ava.Client(
    gateway_url="https://ava-gateway.company.com",
    api_key="your-api-key",
    policy="general_moderate"
)

with client.session(reversibility=True) as session:
    clean = session.sanitize("Contact john@example.com")
    print(clean)  # Contact AVA_EMAI_UfhwZS_2
```

### 2. Embedded Mode (Self-Contained)

Run everything locally with Presidio:

```python
import ava

client = ava.Client(
    engine="presidio",
    policy="healthcare_strict",
    vault_type="memory"
)

with client.session(reversibility=True) as session:
    medical_text = "Patient: Sarah Johnson, DOB: 1985-03-15"
    safe = session.sanitize(medical_text)
    print(safe)  # Patient: AVA_PERS_xK9mP2nQ, DOB: AVA_DATE_aB3cD4eF
```

### 3. Mock Engine (Testing/CI)

No ML dependencies - perfect for unit tests:

```python
import ava

client = ava.Client(engine="mock", policy="general_moderate")

with client.session() as session:
    result = session.sanitize("Email: test@example.com")
    assert "AVA_EMAI_" in result
```

---

## Operating Modes

### Mode 1: Embedded (Local Presidio)

Self-contained deployment for air-gapped environments.

```python
import ava

client = ava.Client(
    engine="presidio",
    policy="healthcare_strict",
    vault_type="sqlite",
    vault_config={
        "db_path": "/secure/ava_vault.db",
        "encryption_key": os.environ["VAULT_KEY"]
    }
)

with client.session(reversibility=True, ttl=3600) as session:
    medical_record = """
    Patient: Maria Gonzalez
    DOB: 1985-03-15
    SSN: 123-45-6789
    Email: maria.g@healthmail.com
    """

    sanitized = session.sanitize(medical_record)

    # Send to OpenAI
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[{"role": "user", "content": sanitized}]
    )

    # Restore original values
    final = session.restore(response['choices'][0]['message']['content'])
```

### Mode 2: Gateway (Remote Client)

Thin client connecting to remote AVA Gateway.

```python
import ava

client = ava.Client(
    gateway_url="https://ava-gateway.company.com",
    api_key="ava_sk_live_abc123xyz789",
    policy="general_moderate"
)

with client.session(reversibility=True) as session:
    customer_email = """
    Hi, this is Robert Chen from Acme Corp.
    My credit card ending in 4532 was charged twice.
    """

    safe_text = session.sanitize(customer_email)
    response = support_ai.process(safe_text)
    readable = session.restore(response)
```

**Environment-based config:**

```bash
# .env
AVA_GATEWAY_URL=https://ava.internal.company.com
AVA_API_KEY=ava_sk_live_xxx
AVA_POLICY=healthcare_strict
```

```python
client = ava.Client.from_env()  # Auto-loads from environment
```

### Mode 3: Mock Engine (Testing)

Regex-based detection for CI/CD.

```python
import ava
import pytest

@pytest.fixture
def mock_client():
    return ava.Client(engine="mock", policy="general_moderate")

def test_email_detection(mock_client):
    with mock_client.session() as session:
        text = "Contact us at support@example.com"
        result = session.sanitize(text)
        assert "AVA_EMAI_" in result

def test_reversibility(mock_client):
    with mock_client.session(reversibility=True) as session:
        original = "Patient: John Doe"
        sanitized = session.sanitize(original)
        restored = session.restore(sanitized)
        assert restored == original
```

### Mode 4: AWS Macie Adapter

```python
import ava

client = ava.Client(
    engine="aws_macie",
    policy="financial_paranoid",
    engine_config={
        "region": "us-east-1",
        "custom_data_identifiers": ["employee-id-pattern"]
    }
)

with client.session(reversibility=True) as session:
    with open("customer_data.csv", "r") as f:
        content = f.read()

    sanitized = session.sanitize(content)
    insights = sagemaker_model.analyze(sanitized)
    report = session.restore(insights)
```

### Mode 5: Azure PII Adapter

```python
import ava

client = ava.Client(
    engine="azure_pii",
    policy="healthcare_strict",
    engine_config={
        "endpoint": "https://ava-pii.cognitiveservices.azure.com",
        "domain_filter": "phi"
    }
)

with client.session(reversibility=True) as session:
    clinical_notes = "Dr. Sarah Johnson examined patient Michael Brown."
    sanitized = session.sanitize(clinical_notes)
    response = azure_openai.ChatCompletion.create(
        deployment_id="gpt-4",
        messages=[{"role": "user", "content": sanitized}]
    )
    final = session.restore(response['choices'][0]['message']['content'])
```

### Mode 6: Google DLP Adapter

```python
import ava

client = ava.Client(
    engine="google_dlp",
    policy="legal_confidential",
    engine_config={
        "project_id": "my-gcp-project",
        "min_likelihood": "LIKELY"
    }
)

with client.session(reversibility=True) as session:
    legal_doc = "ATTORNEY-CLIENT PRIVILEGED From: attorney@lawfirm.com"
    sanitized = session.sanitize(legal_doc)
    summary = legal_ai.summarize(sanitized)
    privileged = session.restore(summary)
```

---

## Vault Types

### Memory Vault (Default)

```python
client = ava.Client(engine="presidio", vault_type="memory")
# In-process storage, never touches disk
# Auto-purged on session exit
```

### SQLite Vault (Persistent)

```python
client = ava.Client(
    engine="presidio",
    vault_type="sqlite",
    vault_config={
        "db_path": "/secure/ava_vault.db",
        "encryption_key": os.environ["VAULT_KEY"],
        "journal_mode": "WAL"
    }
)
# AES-256 encryption
# Survives process restart
```

### Redis Vault (Distributed)

```python
client = ava.Client(
    engine="presidio",
    vault_type="redis",
    vault_config={
        "host": "redis.company.com",
        "port": 6379,
        "password": os.environ["REDIS_PASSWORD"],
        "ssl": True
    }
)
# Cross-machine session sharing
# Microservices support
```

---

## Policies

### Built-in Policies

```python
# HIPAA-compliant healthcare
client = ava.Client(policy="healthcare_strict")

# PCI-DSS level 1 financial
client = ava.Client(policy="financial_paranoid")

# Attorney-client privilege
client = ava.Client(policy="legal_confidential")

# Balanced business use
client = ava.Client(policy="general_moderate")

# Scientific data (irreversible)
client = ava.Client(policy="research_anonymized")
```

### Custom Policy (YAML)

```yaml
# policies/enterprise.yaml
name: enterprise_gdpr
entity_sensitivity:
  PERS: 5  # Always protected
  EMAI: 5
  PHON: 4
  DATE: 2
thresholds:
  min_confidence: 0.85
retention:
  session_ttl: 3600
  audit_retention: 90d
```

```python
client = ava.Client(policy="/path/to/policies/enterprise.yaml")
```

---

## Async API

```python
import asyncio
import ava

async def process_documents():
    client = ava.AsyncClient(engine="presidio", policy="general_moderate")
    documents = ["Doc 1...", "Doc 2...", "Doc 3..."]

    async with client.session() as session:
        # Process all concurrently
        sanitized = await asyncio.gather(*[
            session.sanitize(doc) for doc in documents
        ])

        # Send to AI concurrently
        responses = await asyncio.gather(*[
            call_llm(doc) for doc in sanitized
        ])

        # Restore all concurrently
        final = await asyncio.gather(*[
            session.restore(r) for r in responses
        ])

    return final

asyncio.run(process_documents())
```

---

## Production Workflow: Healthcare API

```python
import ava
from fastapi import FastAPI

app = FastAPI()
client = ava.Client(engine="presidio", policy="healthcare_strict")

@app.post("/summarize-record")
async def summarize(record_id: str):
    record = ehr_system.get_record(record_id)

    with client.session(reversibility=True, ttl=1800) as session:
        # 1. Sanitize before AI
        safe = session.sanitize(record)

        # 2. Send to OpenAI
        response = openai.ChatCompletion.create(
            model="gpt-4",
            messages=[{"role": "user", "content": safe}]
        )

        # 3. Restore PHI
        summary = session.restore(
            response['choices'][0]['message']['content']
        )

        # 4. Audit
        audit_log.store(session.manifest)

    return {"summary": summary, "manifest_id": session.manifest.id}
```

---

## Architecture

```
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Your App   │────▶│ AVA Client  │────▶│   Engine    │
│             │     │  (Embedded  │     │  (Presidio, │
│             │◀────│   or        │◀────│   AWS, etc) │
│             │     │   Gateway)  │     │             │
└─────────────┘     └──────┬──────┘     └─────────────┘
                           │
                    ┌──────┴──────┐
                    │ Token Vault │
                    │ (Memory/    │
                    │  SQLite/    │
                    │  Redis)     │
                    └─────────────┘
```

---

## License

MIT License - see [LICENSE](LICENSE)

---

**Author:** Gerald Enrique Nelson Mc Kenzie  
**DOI:** 10.5281/zenodo.19111004  
**Version:** 0.1.0 | March 2026
