Metadata-Version: 2.4
Name: cachellm-py
Version: 0.2.0
Summary: Auto-optimize LLM prompt caching. Save 60-90% on Claude, GPT & Gemini API costs.
Author: sahilempire
License-Expression: MIT
Project-URL: Homepage, https://github.com/sahilempire/cachellm
Project-URL: Repository, https://github.com/sahilempire/cachellm
Project-URL: Issues, https://github.com/sahilempire/cachellm/issues
Keywords: llm,prompt-caching,anthropic,openai,claude,gpt,gemini,cost-optimization,ai,cache
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries
Classifier: Typing :: Typed
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.30.0; extra == "anthropic"
Provides-Extra: openai
Requires-Dist: openai>=1.40.0; extra == "openai"
Provides-Extra: google
Requires-Dist: google-generativeai>=0.8.0; extra == "google"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23; extra == "dev"

# cachellm

Auto-optimize LLM prompt caching. One line of code, 60-90% savings on your API bill.

## Install

```bash
pip install cachellm
```

## Quick Start

### Anthropic (Claude) — saves up to 90%

```python
from anthropic import Anthropic
from cachellm import optimize_anthropic

client = optimize_anthropic(Anthropic())

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a helpful cooking assistant...",
    messages=[{"role": "user", "content": "How do I make biryani?"}],
)

client.print_stats()
```

### OpenAI (GPT) — saves up to 50%

```python
from openai import OpenAI
from cachellm import optimize_openai

client = optimize_openai(OpenAI())

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant..."},
        {"role": "user", "content": "Hello"},
    ],
)

client.print_stats()
```

## Configuration

```python
from cachellm import optimize_anthropic
from cachellm.types import AnthropicCacheOptions

client = optimize_anthropic(Anthropic(), AnthropicCacheOptions(
    strategy="auto",
    max_breakpoints=4,
    ttl="5m",
    min_tokens=1024,
    debug=False,
))
```

## Standalone Analysis

```python
from cachellm import PromptAnalyzer

analyzer = PromptAnalyzer()
analysis = analyzer.analyze_anthropic_params({
    "system": "Your long system prompt here...",
    "tools": [{"name": "search", "description": "Search the web", "input_schema": {"type": "object"}}],
    "messages": [{"role": "user", "content": "Hello"}],
})

print(f"Cacheable: {analysis.cacheable_tokens} tokens")
print(f"Estimated savings: ~{analysis.estimated_savings_percent}%")
```

## Requirements

- Python >= 3.9
- Zero dependencies (provider SDKs are optional)

## License

[MIT](../LICENSE)
