Metadata-Version: 2.4
Name: zvec-memory
Version: 0.2.1
Summary: Local-first vector memory for AI agents using Zvec and Ollama
Project-URL: Homepage, https://github.com/ereid7/zvec-memory
Project-URL: Repository, https://github.com/ereid7/zvec-memory
Project-URL: Documentation, https://github.com/ereid7/zvec-memory/blob/main/docs
Project-URL: Issues, https://github.com/ereid7/zvec-memory/issues
Author: ereid7
License-Expression: MIT
License-File: LICENSE
Keywords: agents,ai,embeddings,local-first,memory,ollama,vector-database,zvec
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Typing :: Typed
Requires-Python: <3.13,>=3.10
Requires-Dist: numpy>=1.24.0
Requires-Dist: requests>=2.28.0
Requires-Dist: zvec>=0.2.0
Provides-Extra: dev
Requires-Dist: build>=1.2.0; extra == 'dev'
Requires-Dist: httpx>=0.25.0; extra == 'dev'
Requires-Dist: mypy>=1.5.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Requires-Dist: twine>=5.0.0; extra == 'dev'
Requires-Dist: types-requests>=2.32.0; extra == 'dev'
Provides-Extra: server
Requires-Dist: fastapi>=0.104.0; extra == 'server'
Requires-Dist: uvicorn[standard]>=0.24.0; extra == 'server'
Description-Content-Type: text/markdown

<div align="center">

# 🧠 zvec-memory

**Local-first vector memory for AI agents.**

[![Python](https://img.shields.io/badge/python-3.10%2B-blue)](https://www.python.org/)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
[![Zvec](https://img.shields.io/badge/powered%20by-Zvec-green)](https://zvec.org/)
[![Ollama](https://img.shields.io/badge/embeddings-Ollama-orange)](https://ollama.ai/)

*In-process vector database with hybrid semantic + keyword search. No cloud. No API keys. Just local embeddings and fast recall.*

</div>

---

## Quick Start

```bash
# Install
pip install zvec-memory

# Ensure Ollama is running with an embedding model
ollama pull nomic-embed-text

# Store a memory
zvec-memory add "Alice prefers local-first architecture" \
    --type semantic --importance 8 --tags preferences,architecture

# Search memories
zvec-memory search "what are the architecture preferences"
```

---

## What Makes This Different

Most agent memory systems are either:
- **Cloud-hosted** (privacy concerns, latency, cost)
- **File-based only** (no semantic search, just grep)
- **Require external vector DBs** (Qdrant, Chroma, etc. — more infrastructure)

**zvec-memory** is different:

| Feature | zvec-memory | Alternatives |
|---------|-------------|--------------|
| **Architecture** | In-process (Zvec) | Client-server |
| **Embeddings** | Local (Ollama) | Cloud APIs |
| **Hybrid search** | Dense + Sparse + Filters | Dense only |
| **Setup** | `pip install` | Docker + config |
| **Privacy** | 100% local | Cloud exfiltration |
| **Latency** | <5ms search | Network roundtrip |

---

## Architecture

```
┌─────────────────────────────────────────┐
│           Your AI Agent                 │
└─────────────┬───────────────────────────┘
              │
┌─────────────▼───────────────────────────┐
│      zvec-memory Engine                 │
│  ┌─────────────────────────┐            │
│  │  Hybrid Search Layer    │            │
│  │  • Dense (semantic)     │            │
│  │  • Sparse (keywords)    │            │
│  │  • Metadata filters     │            │
│  └─────────────────────────┘            │
│  ┌─────────────────────────┐            │
│  │  Embedding Layer        │            │
│  │  • nomic-embed-text     │            │
│  │  • BM25 sparse vectors  │            │
│  └─────────────────────────┘            │
│  ┌─────────────────────────┐            │
│  │  Cognitive Layer        │            │
│  │  • Decay scoring        │            │
│  │  • Access reinforcement │            │
│  │  • Deduplication        │            │
│  │  • Contradiction detect │            │
│  └─────────────────────────┘            │
└──────┬──────────────────┬───────────────┘
       │                  │
┌──────▼──────┐   ┌──────▼───────────────┐
│  Zvec       │   │  SQLite (WAL)        │
│  • HNSW     │   │  • Graph edges       │
│  • Dense    │   │  • Embedding cache   │
│  • Sparse   │   │  • FTS5 full-text    │
│  • Filters  │   │  • Version chains    │
└─────────────┘   └──────────────────────┘
```

---

## Memory Taxonomy

zvec-memory uses five cognitive-inspired memory types:

| Type | Purpose | Example |
|------|---------|---------|
| **episodic** | Events, conversations | "[2026-02-15] Discussed Zvec integration" |
| **semantic** | Facts, preferences | "Alice prefers local-first architecture" |
| **procedural** | How-to patterns | "To check weather: use weather skill" |
| **entity** | People, projects, things | "Bob: full access to example-project repo" |
| **core** | Identity, values | "Agent values transparency and privacy" |

---

## Installation

### Requirements

- Python 3.10, 3.11, or 3.12 (Zvec requirement)
- macOS or Linux
- [Ollama](https://ollama.ai/) running locally

### Install Ollama

```bash
# macOS
brew install ollama
ollama serve

# Linux
curl -fsSL https://ollama.com/install.sh | sh
```

### Install zvec-memory

```bash
pip install zvec-memory

# Or from source
git clone https://github.com/ereid7/zvec-memory.git
cd zvec-memory
pip install -e ".[dev]"
```

### Pull Embedding Model

```bash
ollama pull nomic-embed-text
```

---

## Usage

### CLI

```bash
# Store a memory
zvec-memory add "Memory text" --type semantic --importance 7

# Search (hybrid semantic + keyword)
zvec-memory search "query text" --topk 10

# Extract memories from text (requires llama3.2)
zvec-memory extract "conversation text" --source telegram

# Maintenance: run decay and optimize
zvec-memory maintain

# Stats
zvec-memory stats

# Reindex from files
zvec-memory reindex --source all

# Start REST API server
zvec-memory serve --port 8400
```

### Python API

```python
from zvec_memory import MemoryEngine

# Initialize
engine = MemoryEngine()

# Store
engine.store(
    text="Alice prefers local-first architecture",
    memory_type="semantic",
    importance=8.0,
    tags=["preferences", "architecture"]
)

# Search
results = engine.recall(
    query="cloud vs local",
    topk=5,
    memory_types=["semantic", "episodic"]
)

for r in results:
    print(f"[{r['score']:.3f}] {r['fields']['text']}")

# Context for prompts
from zvec_memory.context import get_context

context = get_context("user message here", max_tokens=500)
```

### REST API

Start the server:

```bash
zvec-memory serve --port 8400
```

| Method | Endpoint | Description |
|--------|----------|-------------|
| `GET` | `/health` | Health check |
| `GET` | `/memories` | List memories |
| `POST` | `/memories` | Store a memory |
| `POST` | `/memories/extract` | Extract facts from text |
| `POST` | `/memories/ingest` | Ingest a document |
| `GET` | `/search` | Search memories |
| `GET` | `/memories/{id}` | Get a memory by ID |
| `GET` | `/memories/{id}/graph` | Get graph edges for a memory |
| `DELETE` | `/memories/{id}` | Forget a memory |
| `POST` | `/memories/{id}/restore` | Restore a forgotten memory |
| `POST` | `/maintain` | Run maintenance (decay + cleanup) |
| `GET` | `/stats` | Memory statistics |
| `GET` | `/graph/stats` | Graph edge statistics |
| `GET` | `/graph/export` | Export graph as JSON |
| `GET` | `/config` | Current embedding config |

> **📖 Interactive API docs** available at `http://localhost:8400/docs` (Swagger UI) and `http://localhost:8400/redoc` (ReDoc) when the server is running.

---

## Embedding Models

zvec-memory works with any embedding model via two providers:

### Ollama (default, local)
```bash
# Default: nomic-embed-text (768-dim, 8K context)
ollama pull nomic-embed-text

# Recommended upgrade: qwen3-embedding (1024-dim, 32K context, #1 MTEB)
ollama pull qwen3-embedding:0.6b
export EMBED_MODEL=qwen3-embedding:0.6b
```

### OpenAI-compatible APIs
```bash
# OpenAI
export EMBED_PROVIDER=openai
export EMBED_MODEL=text-embedding-3-small
export EMBED_API_KEY=sk-...

# Any compatible API (Voyage, Cohere, Together, vLLM, etc.)
export EMBED_PROVIDER=openai
export EMBED_URL=https://api.voyageai.com/v1
export EMBED_MODEL=voyage-3
export EMBED_API_KEY=pa-...
```

### Switching Models
Changing embedding models requires rebuilding the vector index:
```bash
zvec-memory reindex --source all
```
Dimensions are auto-detected — no manual config needed.

---

## Configuration

> **📖 Full reference:** See [`docs/CONFIG.md`](docs/CONFIG.md) for all 30+ environment variables and internal constants.

Key environment variables:

```bash
export ZVEC_MEMORY_PATH="~/.zvec-memory"
export OLLAMA_URL="http://127.0.0.1:11434"
export EMBED_MODEL="nomic-embed-text"
export EMBED_PROVIDER="ollama"        # or "openai" or "none"
export EMBED_URL=""                   # for openai provider
export EMBED_API_KEY=""               # for openai provider
export EMBED_DIM="0"                  # 0 = auto-detect
```

---

## How It Works

### 1. Hybrid Search

Every query uses **three** search strategies simultaneously:

- **Dense vectors** (768-dim): Semantic meaning via nomic-embed-text
- **Sparse vectors** (BM25): Keyword matching for exact terms
- **Metadata filters**: Memory type, source, participants, time ranges

Results are merged with weighted reranking (dense weighted 1.2×, sparse 0.8×).

### 2. Cognitive Decay

Memories fade naturally based on:

```
decay_score = (importance/10) × recency × access_factor

recency = exp(-λ × days_since_last_access)  # λ = 0.03, half-life ~23 days
access_factor = log2(access_count + 1) / 5
```

Unused memories fade. Frequently-accessed memories stay sharp.

### 3. Deduplication

Before storing, we check for existing similar memories (>92% similarity). If found:
- Reinforce the existing memory (bump access_count, update last_accessed)
- Skip creating a duplicate

---

## Integration Examples

### Custom Agent

```python
class AgentWithMemory:
    def __init__(self):
        self.memory = MemoryEngine()

    def chat(self, message: str) -> str:
        # Get relevant context
        context = self.memory.recall(message, topk=5)

        # Build prompt with context
        prompt = f"""Relevant memories:
{format_memories(context)}

User: {message}
Assistant:"""

        # Get LLM response
        response = llm.generate(prompt)

        # Store this exchange
        self.memory.store(
            text=f"User asked: {message}\nAssistant responded: {response}",
            memory_type="episodic",
            importance=5
        )

        return response
```
