Metadata-Version: 2.4
Name: cachefuse
Version: 0.1.0
Summary: Enterprise-grade caching framework for LLM responses and embeddings
Author-email: Yasser Elhaddar <yasser.elhaddar@example.com>
License: MIT
Keywords: cache,llm,embeddings,ai,openai,sqlite,redis,caching,performance
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: typer>=0.12
Requires-Dist: pydantic>=2
Requires-Dist: filelock>=3.12
Requires-Dist: orjson>=3.9; platform_python_implementation != "PyPy"
Provides-Extra: redis
Requires-Dist: redis>=5; extra == "redis"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.1; extra == "dev"
Requires-Dist: fakeredis>=2.21.0; extra == "dev"
Requires-Dist: build>=1.0.3; extra == "dev"
Requires-Dist: openai>=1.30.0; extra == "dev"
Requires-Dist: python-dotenv>=1.0.1; extra == "dev"
Requires-Dist: ipykernel>=6.29.0; extra == "dev"
Requires-Dist: sentence-transformers>=3.0.0; extra == "dev"
Dynamic: license-file

<div align="center">
  <img src="./assets/logo.png" alt="CacheFuse Logo" width="100"/>
  
  # CacheFuse
  
  **Enterprise-grade caching framework for LLM responses and embeddings**
  
  [![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
  [![PyPI version](https://badge.fury.io/py/cachefuse.svg)](https://badge.fury.io/py/cachefuse)
  
  *Dramatically reduce LLM API costs and latency with intelligent caching*
  
</div>

---

## 🚀 Why CacheFuse?

CacheFuse transforms expensive LLM applications into lightning-fast, cost-effective systems through intelligent caching.

### 💰 **Massive Cost Savings**
- **60-90% API cost reduction** in typical applications
- **100x faster responses** for cached queries (<3ms vs 2-5 seconds)
- **Smart invalidation** prevents stale results

### ⚡ **Enterprise-Ready Features**
- **Deterministic cache keys** - Same inputs always produce same cache keys
- **Stampede protection** - Concurrent requests handled intelligently  
- **Multi-backend support** - SQLite (local) or Redis (distributed)
- **Privacy-compliant** - Hash-only mode with optional redaction hooks
- **Production monitoring** - Hit rates, latency metrics, and CLI tools

### 🔧 **Developer-First Design**
- **Drop-in decorators** - Add `@llm` or `@embed` to existing functions
- **Zero configuration** - Works out of the box with sensible defaults
- **Flexible invalidation** - TTL, tags, and template versioning
- **Thread-safe** - Handles concurrency without race conditions

## 📦 Installation

### Production
```bash
pip install cachefuse
```

### Development
```bash
uv venv .venv
source .venv/bin/activate
uv pip install -e ".[dev]"
```

### Optional Dependencies
```bash
pip install cachefuse[redis]  # For Redis backend support
```

## ⚡ Quickstart

### Basic LLM Caching
```python
from cachefuse.api.cache import Cache
from cachefuse.api.decorators import llm
import openai

# Initialize cache (works out of the box)
cache = Cache.from_env()

@llm(cache=cache, ttl="7d", tag="summarize-v1", template_version="1")
def summarize(text: str, model: str = "gpt-4o-mini") -> str:
    response = openai.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": f"Summarize: {text}"}]
    )
    return response.choices[0].message.content

# First call: API request (slow + costs money)
summary1 = summarize("CacheFuse speeds up LLM applications")  # ~2-5 seconds

# Second call: Cache hit (fast + free)  
summary2 = summarize("CacheFuse speeds up LLM applications")  # ~3ms

print(f"Results identical: {summary1 == summary2}")  # True
print(f"Cache stats: {cache.stats()}")  # Hit rate, latency, savings
```

### Embedding Caching
```python
from cachefuse.api.decorators import embed

@embed(cache=cache, ttl="30d", tag="embeddings-v1")
def get_embeddings(texts: list[str], model: str = "text-embedding-ada-002") -> list[float]:
    response = openai.embeddings.create(
        model=model,
        input=texts
    )
    return [embedding.embedding for embedding in response.data]

# Expensive embedding calls cached automatically
vectors = get_embeddings(["Hello world", "Goodbye world"])
```

### CLI Management
```bash
# View cache statistics
cachefuse stats

# Clear specific tags  
cachefuse purge --tag summarize-v1

# Compact SQLite database
cachefuse vacuum

# View help
cachefuse --help
```

### Real-World Example
```python
# RAG application with caching
@llm(cache=cache, ttl="1h", tag="rag-v1", template_version="2")
def answer_question(question: str, context: str, model: str = "gpt-4") -> str:
    return openai.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "Answer based on the context provided."},
            {"role": "user", "content": f"Context: {context}\n\nQuestion: {question}"}
        ]
    ).choices[0].message.content

# Same questions with same context = instant responses + no API costs
answer = answer_question("What is CacheFuse?", "CacheFuse is a caching framework...")
```

## 🏗️ Architecture

CacheFuse is built on a clean, modular architecture designed for enterprise-scale applications:

```
┌─────────────────────────────────────────────────────────┐
│                   @llm / @embed                         │
│                   Decorators                            │
└─────────────────┬───────────────────────────────────────┘
                  │
┌─────────────────▼───────────────────────────────────────┐
│              Cache Facade                               │
│  • Deterministic fingerprinting                        │
│  • Stampede protection (per-key locks)                 │
│  • Metrics collection                                   │
│  • Privacy mode handling                               │
└─────────────────┬───────────────────────────────────────┘
                  │
    ┌─────────────▼──────────────┐
    │        Backends            │
    ├────────────┬───────────────┤
    │   SQLite   │     Redis     │
    │  (local)   │ (distributed) │
    └────────────┴───────────────┘
```

### Key Components

- **Decorators** - Simple `@llm` and `@embed` decorators for drop-in caching
- **Cache Facade** - Intelligent cache management with fingerprinting and concurrency control
- **Multi-Backend** - SQLite for local development, Redis for production scale
- **Metrics System** - Real-time performance tracking and cost analysis

## ⚙️ Configuration

### Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `CF_BACKEND` | `sqlite` | Backend type (`sqlite` or `redis`) |
| `CF_SQLITE_PATH` | `~/.cache/cachefuse/cache.db` | SQLite database file path |
| `CF_REDIS_URL` | - | Redis connection string (e.g., `redis://localhost:6379/0`) |
| `CF_MODE` | `normal` | Privacy mode (`normal` or `hash_only`) |
| `CF_LOCK_TIMEOUT` | `30` | Per-key lock timeout in seconds |

### Configuration Methods

```python
# Method 1: Environment-based (recommended)
from cachefuse.api.cache import Cache
cache = Cache.from_env()

# Method 2: Explicit configuration
from cachefuse.config import CacheConfig
config = CacheConfig(
    backend="redis",
    redis_url="redis://localhost:6379/0",
    mode="hash_only"
)
cache = Cache.from_config(config)
```

## 🗄️ Storage Backends

### SQLite Backend (Default)
Perfect for local development, single-machine deployments, and applications requiring file-based persistence.

**Features:**
- Single-file storage with WAL mode for optimal performance
- Built-in ACID transactions
- Automatic schema migration
- Vacuum support for space reclamation
- Zero external dependencies

```python
# Automatic (default)
cache = Cache.from_env()

# Explicit configuration
cache = Cache.from_config(CacheConfig(
    backend="sqlite",
    sqlite_path="/custom/path/cache.db"
))
```

### Redis Backend
Ideal for distributed applications, horizontal scaling, and shared cache scenarios.

**Features:**
- Distributed caching across multiple instances
- Built-in TTL expiration
- Atomic operations with Redis transactions
- Tag-based bulk operations using sets
- High availability and clustering support

```python
cache = Cache.from_config(CacheConfig(
    backend="redis", 
    redis_url="redis://localhost:6379/0"
))
```

**Redis Key Layout:**
- `cf:entry:<key>` - Cache entry data
- `cf:tag:<tag>` - Set of keys with specific tag

## 🎛️ Advanced Features

### TTL (Time-To-Live)
Flexible expiration control with human-readable formats:

```python
@llm(cache=cache, ttl="7d")      # 7 days
@llm(cache=cache, ttl="2h")      # 2 hours  
@llm(cache=cache, ttl="30m")     # 30 minutes
@llm(cache=cache, ttl="300s")    # 300 seconds
@llm(cache=cache, ttl=0)         # No expiration
```

### Tags & Bulk Invalidation
Group related cache entries for easy management:

```python
# Tag entries by version, feature, or use case
@llm(cache=cache, ttl="1h", tag="summarize-v2")
def summarize_v2(text: str) -> str: ...

@llm(cache=cache, ttl="1h", tags=["rag", "qa-v1"])  
def answer_question(question: str, context: str) -> str: ...

# Bulk invalidation
cache.purge_tag("summarize-v2")  # Clear all v2 summaries
```

```bash
# CLI bulk operations
cachefuse purge --tag rag          # Clear all RAG cache entries
cachefuse purge --tag qa-v1        # Clear v1 Q&A entries
```

### Template Versioning
Automatic cache invalidation when prompts change:

```python
# Version 1
@llm(cache=cache, ttl="1d", template_version="1")
def analyze_sentiment(text: str) -> str:
    return f"Analyze sentiment: {text}"

# Version 2 - automatically uses different cache keys
@llm(cache=cache, ttl="1d", template_version="2") 
def analyze_sentiment(text: str) -> str:
    return f"Analyze sentiment with context: {text}"
```

### Deterministic Cache Keys
Cache keys are generated from:
- **Function type** (`llm` or `embed`)
- **Model parameters** (model name, temperature, etc.)
- **Template version** 
- **Input hash** (SHA256 of processed input)
- **Provider info** (optional)

## 🔒 Privacy & Security

### Hash-Only Mode
For privacy-sensitive applications, store only hashes instead of raw content:

```python
from cachefuse.config import CacheConfig

# Enable privacy mode
config = CacheConfig(backend="sqlite", mode="hash_only")
cache = Cache.from_config(config)

@llm(cache=cache, ttl="1h")
def process_sensitive_data(user_input: str) -> str:
    # Raw input never stored, only hash-based cache keys
    return llm_provider_call(user_input)
```

### Content Redaction
Automatically redact sensitive information before hashing:

```python
def redactor(text: str) -> str:
    # Custom redaction logic
    return text.replace("SECRET_TOKEN", "[REDACTED]").replace("PASSWORD", "[REDACTED]")

cache = Cache(backend=cache._backend, config=config, redactor=redactor)

# Both calls hit the same cache (identical after redaction)
result1 = process_data("User SECRET_TOKEN abc123")  
result2 = process_data("User [REDACTED] abc123")     # Cache hit!
```

### Security Features
- **No sensitive data storage** in hash-only mode
- **Deterministic redaction** ensures consistent cache hits
- **Configurable redaction functions** for custom privacy needs
- **Thread-safe operations** prevent race conditions

## 📊 Performance Monitoring

### Real-Time Metrics
Track cache performance and cost savings:

```python
stats = cache.stats()
print(f"""
Cache Performance:
  Entries: {stats['entries']}
  Total Calls: {stats['total_calls']}
  Cache Hits: {stats['hits']}
  Hit Rate: {stats['hit_rate']:.2%}
  Avg Latency: {stats['avg_latency_ms']:.1f}ms
  Cost Saved: ${stats['cost_saved']:.2f}
""")
```

### CLI Monitoring
```bash
# Detailed performance stats
cachefuse stats

# Output:
# entries: 150
# total_calls: 1000  
# hits: 850
# hit_rate: 0.85
# avg_latency_ms: 2.3
# cost_saved: 127.50
```

### Production Monitoring
```python
# Log metrics for monitoring systems
import logging
logger = logging.getLogger("cachefuse.metrics")

stats = cache.stats()
logger.info("cache_metrics", extra={
    "hit_rate": stats["hit_rate"],
    "avg_latency": stats["avg_latency_ms"], 
    "cost_saved": stats["cost_saved"]
})
```

## 🔄 Concurrency & Reliability  

### Stampede Protection
Prevents duplicate expensive operations when multiple requests arrive simultaneously:

```python
# 100 concurrent requests for same uncached item
# Result: Only 1 API call, 99 cache hits
results = await asyncio.gather(*[
    summarize("same input") for _ in range(100)
])
# All results identical, massive cost/latency savings
```

### Thread Safety
- **Per-key file locks** prevent race conditions
- **ACID transactions** ensure data consistency  
- **Atomic operations** for concurrent access
- **Lock timeout handling** prevents deadlocks

### Reliability Features
- **Graceful degradation** when cache unavailable
- **Automatic retry logic** for transient failures
- **Connection pooling** for Redis backend
- **WAL mode** for SQLite performance

## 🧪 Testing & Development

### Running Tests
```bash
# Install development dependencies
uv pip install -e ".[dev]"

# Run unit tests (fast)
uv run pytest -q -m "not integration" --cov=cachefuse

# Run integration tests (requires Redis for some tests)
uv run pytest -q -m integration

# Run all tests
uv run pytest --cov=cachefuse
```

### Performance Benchmarks
- **Cache hit latency**: < 3ms (SQLite), < 1ms (Redis)  
- **Stampede protection**: 1 provider call regardless of concurrency
- **Memory overhead**: ~50MB typical usage
- **Storage efficiency**: Configurable compression and cleanup

### Examples & Demos
```bash
# RAG application demo
uv run python -m cachefuse.examples.rag_demo

# Embedding caching demo  
uv run python -m cachefuse.examples.embed_demo
```

## 🗺️ Roadmap

### v0.2.0 - Advanced Caching
- [ ] Semantic similarity caching
- [ ] Batch operations API
- [ ] Enhanced metrics (p95/p99 latencies)

### v0.3.0 - Enterprise Features  
- [ ] Prometheus metrics export
- [ ] Distributed locking with Redis
- [ ] Advanced compression algorithms

### v0.4.0 - Provider Integration
- [ ] Native OpenAI SDK integration
- [ ] Anthropic Claude SDK support
- [ ] Automatic cost tracking by provider

### Future Releases
- [ ] Web dashboard for cache management
- [ ] Circuit breaker patterns
- [ ] Multi-tier caching strategies

## 📈 Performance Comparison

| Scenario | Without CacheFuse | With CacheFuse | Improvement |
|----------|------------------|----------------|-------------|
| Repeated queries | 2-5 seconds | < 3ms | **100-1000x faster** |
| API costs | $0.02 per call | $0.00 (cached) | **90%+ savings** |
| Concurrency | N × API calls | 1 API call | **Perfect deduplication** |
| Memory usage | Negligible | ~50MB | **Minimal overhead** |


### Development Setup
```bash
# Clone the repository
git clone https://github.com/Yasserelhaddar/CacheFuse.git
cd CacheFuse

# Set up development environment
uv venv .venv
source .venv/bin/activate
uv pip install -e ".[dev]"

# Run tests
uv run pytest
```

### Areas for Contribution
- 🐛 Bug fixes and stability improvements
- ⚡ Performance optimizations
- 📚 Documentation and examples
- 🔌 New backend implementations
- 🧪 Test coverage improvements

## 📄 License

MIT License - see [LICENSE](LICENSE) file for details.

---

<div align="center">
  <p>
    <strong>Built with ❤️ for the AI community</strong>
  </p>
  <p>
    <em>Star ⭐ this repo if CacheFuse helps you build better LLM applications!</em>
  </p>
</div>
