Metadata-Version: 2.4
Name: agent-context-manager
Version: 0.1.0
Summary: Lightweight context window management for AI agents
Author-email: Korah Stone <korahcomm@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/KorahStone/agent-context-manager
Project-URL: Repository, https://github.com/KorahStone/agent-context-manager
Project-URL: Issues, https://github.com/KorahStone/agent-context-manager/issues
Keywords: ai,agent,llm,context,memory,token,compression,chatbot,openai,anthropic
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: tiktoken
Requires-Dist: tiktoken>=0.5.0; extra == "tiktoken"
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.18.0; extra == "anthropic"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Dynamic: license-file

# agent-context-manager

A lightweight Python library for managing LLM context windows in AI agents. Prevents context overflow, reduces token costs, and maintains conversation coherence.

## The Problem

AI agents face a critical challenge: context windows fill up fast. When they overflow:
- **Costs explode** - token usage grows exponentially
- **Performance degrades** - LLMs struggle with long contexts ("lost in the middle")
- **Coherence breaks** - agents forget important context while keeping noise

Current solutions are either too complex (require LLM calls for summarization) or too naive (just truncate old messages).

## The Solution

`agent-context-manager` provides intelligent context compression without requiring additional LLM calls:

- **Token-aware management** - Track usage, warn before overflow
- **Multiple compression strategies** - Choose what fits your use case
- **Framework agnostic** - Works with any LLM provider
- **Zero LLM dependencies** - No API calls needed for compression

## Installation

```bash
pip install agent-context-manager
```

## Quick Start

```python
from agent_context_manager import ContextManager, SlidingWindowStrategy

# Create a context manager with 8K token limit
manager = ContextManager(
    max_tokens=8000,
    strategy=SlidingWindowStrategy(keep_system=True, keep_recent=10)
)

# Add messages as your agent works
manager.add_message({"role": "system", "content": "You are a helpful assistant."})
manager.add_message({"role": "user", "content": "Hello!"})
manager.add_message({"role": "assistant", "content": "Hi there!"})

# Get compressed context when needed
context = manager.get_context()

# Check token usage
print(f"Tokens used: {manager.token_count}/{manager.max_tokens}")
```

## Compression Strategies

### 1. Sliding Window (Default)
Keeps the most recent N messages, always preserving system messages.

```python
from agent_context_manager import SlidingWindowStrategy

strategy = SlidingWindowStrategy(
    keep_system=True,      # Always keep system messages
    keep_recent=20,        # Keep last 20 messages
    keep_first_user=True   # Keep the original user request
)
```

### 2. Importance Scoring
Scores messages by relevance and keeps the most important ones.

```python
from agent_context_manager import ImportanceStrategy

strategy = ImportanceStrategy(
    system_weight=1.0,     # System messages always kept
    user_weight=0.8,       # User messages high priority
    assistant_weight=0.6,  # Assistant messages medium priority
    tool_weight=0.4,       # Tool results lower priority
    recency_decay=0.95     # Recent messages score higher
)
```

### 3. Semantic Deduplication
Removes near-duplicate messages to reduce redundancy.

```python
from agent_context_manager import DeduplicationStrategy

strategy = DeduplicationStrategy(
    similarity_threshold=0.85,  # Remove if >85% similar
    keep_latest=True            # Keep the most recent of duplicates
)
```

### 4. Hybrid (Recommended for Production)
Combines multiple strategies for best results.

```python
from agent_context_manager import HybridStrategy

strategy = HybridStrategy([
    DeduplicationStrategy(similarity_threshold=0.9),
    ImportanceStrategy(recency_decay=0.95),
    SlidingWindowStrategy(keep_recent=50)
])
```

## Token Counting

Built-in token counting for popular models:

```python
from agent_context_manager import ContextManager

# Auto-detect tokenizer based on model
manager = ContextManager(max_tokens=8000, model="gpt-4")
manager = ContextManager(max_tokens=100000, model="claude-3")

# Or use a custom tokenizer
manager = ContextManager(
    max_tokens=8000,
    tokenizer=my_custom_tokenizer
)
```

## Overflow Handling

```python
from agent_context_manager import ContextManager, OverflowPolicy

manager = ContextManager(
    max_tokens=8000,
    overflow_policy=OverflowPolicy.COMPRESS,  # Auto-compress when near limit
    overflow_threshold=0.9  # Compress at 90% capacity
)

# Or get warnings instead
manager = ContextManager(
    max_tokens=8000,
    overflow_policy=OverflowPolicy.WARN
)

# Check status
if manager.is_near_overflow():
    print(f"Warning: {manager.usage_percent}% of context used")
```

## Memory Blocks (Structured Context)

Organize context into logical blocks with size limits:

```python
from agent_context_manager import ContextManager, MemoryBlock

manager = ContextManager(max_tokens=8000)

# Define memory blocks
manager.add_block(MemoryBlock(
    name="system",
    max_tokens=500,
    priority=1.0,  # Highest priority, never compressed
    content="You are a helpful coding assistant."
))

manager.add_block(MemoryBlock(
    name="user_profile",
    max_tokens=200,
    priority=0.9,
    content="User prefers Python, uses VS Code."
))

manager.add_block(MemoryBlock(
    name="conversation",
    max_tokens=7000,
    priority=0.5,  # Can be compressed if needed
    strategy=SlidingWindowStrategy(keep_recent=30)
))

# Update blocks as needed
manager.update_block("user_profile", "User prefers Python, uses VS Code, timezone: PST")
```

## Integration Examples

### With OpenAI

```python
from openai import OpenAI
from agent_context_manager import ContextManager

client = OpenAI()
manager = ContextManager(max_tokens=8000, model="gpt-4")

manager.add_message({"role": "system", "content": "You are helpful."})

while True:
    user_input = input("You: ")
    manager.add_message({"role": "user", "content": user_input})
    
    response = client.chat.completions.create(
        model="gpt-4",
        messages=manager.get_context()  # Auto-compressed if needed
    )
    
    assistant_message = response.choices[0].message.content
    manager.add_message({"role": "assistant", "content": assistant_message})
    print(f"Assistant: {assistant_message}")
```

### With Anthropic

```python
from anthropic import Anthropic
from agent_context_manager import ContextManager

client = Anthropic()
manager = ContextManager(max_tokens=100000, model="claude-3")

# Same pattern works with any provider
```

### With LangChain

```python
from langchain.memory import ConversationBufferMemory
from agent_context_manager import ContextManager, LangChainAdapter

manager = ContextManager(max_tokens=8000)
memory = LangChainAdapter(manager)  # Drop-in replacement
```

## API Reference

### ContextManager

| Method | Description |
|--------|-------------|
| `add_message(msg)` | Add a message to context |
| `get_context()` | Get compressed context as message list |
| `token_count` | Current token count |
| `usage_percent` | Percentage of context used |
| `is_near_overflow()` | Check if approaching limit |
| `compress()` | Manually trigger compression |
| `clear()` | Clear all messages |

### Strategies

| Strategy | Best For |
|----------|----------|
| `SlidingWindowStrategy` | Simple agents, chatbots |
| `ImportanceStrategy` | Complex agents with tool use |
| `DeduplicationStrategy` | Repetitive workflows |
| `HybridStrategy` | Production systems |

## Contributing

Contributions welcome! Please read CONTRIBUTING.md first.

## License

MIT License - see LICENSE file.
