Metadata-Version: 2.4
Name: memorymesh-mcp
Version: 0.2.0
Summary: Local MCP hub for personal data — private, cross-platform, agent-ready.
Project-URL: Homepage, https://github.com/kilhubprojects/memory-mesh
Project-URL: Repository, https://github.com/kilhubprojects/memory-mesh
Project-URL: Issues, https://github.com/kilhubprojects/memory-mesh/issues
Project-URL: Changelog, https://github.com/kilhubprojects/memory-mesh/blob/main/CHANGELOG.md
Author-email: Carlos Coelho <kilhub.projects@gmail.com>
License-Expression: MIT
License-File: LICENSE
Keywords: embeddings,local-first,mcp,personal-data,rag,search
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Indexing
Requires-Python: >=3.11
Requires-Dist: chromadb>=0.5
Requires-Dist: loguru>=0.7
Requires-Dist: markdown-it-py>=3.0
Requires-Dist: mcp>=1.0
Requires-Dist: psutil>=5.9
Requires-Dist: pydantic-settings>=2.0
Requires-Dist: pydantic>=2.0
Requires-Dist: pypdf>=4.0
Requires-Dist: python-docx>=1.1
Requires-Dist: pyyaml>=6.0
Requires-Dist: rank-bm25>=0.2
Requires-Dist: rich>=13.0
Requires-Dist: sentence-transformers>=2.7
Requires-Dist: starlette>=0.37
Requires-Dist: tree-sitter-languages>=1.10
Requires-Dist: tree-sitter>=0.21
Requires-Dist: typer>=0.12
Requires-Dist: uvicorn>=0.29
Requires-Dist: watchdog>=4.0
Description-Content-Type: text/markdown

# MemoryMesh

> Universal MCP hub for personal data. Local-first, private by default, designed to be the memory layer of the agents you'll build next.

[![CI](https://img.shields.io/github/actions/workflow/status/kilhubprojects/memory-mesh/ci.yml?label=CI)](https://github.com/kilhubprojects/memory-mesh/actions)
[![PyPI](https://img.shields.io/pypi/v/memorymesh-mcp)](https://pypi.org/project/memorymesh-mcp/)
[![License](https://img.shields.io/badge/license-MIT-blue)](./LICENSE)
[![Python](https://img.shields.io/badge/python-3.11+-blue)](https://www.python.org/)
[![MCP](https://img.shields.io/badge/MCP-stdio%20%7C%20streamable--http-purple)](https://modelcontextprotocol.io/)
[![Tests](https://img.shields.io/badge/tests-172%20passing-brightgreen)]()
[![v0.1.0](https://img.shields.io/badge/version-v0.1.0-orange)](https://github.com/kilhubprojects/memory-mesh/releases/tag/v0.1.0)

MemoryMesh indexes your local files — and in future versions, your emails, calendar, browser history, and chat logs — and exposes them through the [Model Context Protocol](https://modelcontextprotocol.io/). Any MCP-aware client (Claude Desktop, Cursor, Claude Code, or your own agent) can ask semantic questions over the things you actually own, without sending a single byte to the cloud.

It is a **hub**, not a single-purpose RAG. The transport, embedding model, parser, and chunking strategy are all swappable behind clean interfaces — so the same hub can grow from "search my notes" to "remember everything for my Agent OS."

---

## Why this exists

Personal data is fragmented across dozens of apps. No AI agent can reach all of it in a unified, private way. Anthropic's MCP defined the protocol; MemoryMesh fills the gap of the hub that wires everything up — locally, with privacy as a precondition rather than a setting.

---

## How it works

```
                   ┌──────────────────────────────┐
  MCP clients ───▶ │         MemoryMesh           │
(Claude Desktop,   │  ┌────────────────────────┐  │
 Cursor, agents)   │  │ MCP Tools (FastMCP):   │  │
                   │  │  search_memory         │  │
                   │  │  list_sources          │  │
                   │  │  get_document          │  │
                   │  │  index_now             │  │
                   │  └──────────┬─────────────┘  │
                   │             ▼                 │
                   │     Search Engine             │
                   │   dense + BM25 → RRF          │
                   │             │                 │
                   │   ┌─────────┴──────────┐      │
                   │   ▼                    ▼      │
                   │ ChromaDB            BM25      │
                   │ (embeddings)     (sparse)     │
                   │   ▲                    ▲      │
                   │   └──────── Indexer ───┘      │
                   │                ▲              │
                   │           Watchdog            │
                   └────────────────┬──────────────┘
                                    ▼
                             Your filesystem
```

**Indexing pipeline:** file watcher detects changes → SHA-256 dedup skips unchanged files → parser (txt/md/pdf/docx/code) → smart chunker (tree-sitter for code, by-heading for markdown, recursive for text) → embeddings via `sentence-transformers` → upsert into ChromaDB + BM25 index.

**Search pipeline:** query → dense search (ChromaDB) + sparse search (BM25) over-fetch → Reciprocal Rank Fusion (k=60) → top-k results with path, preview, score, and metadata.

---

## What makes it different

Most comparable tools pick one dimension to optimize. MemoryMesh is the only one that hits all of them simultaneously:

| Feature | MemoryMesh | LangChain | LlamaIndex | PrivateGPT | AnythingLLM | MemGPT | Haystack |
|---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
| **MCP native** | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| **Hybrid search (dense + BM25 + RRF)** | ✅ | Partial | Partial | ❌ | ❌ | ❌ | ✅ |
| **Real-time watcher + SHA-256 dedup** | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| **Post-crash reconciliation** | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| **100% local, zero telemetry** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| **Cross-platform (Win/Linux/Mac)** | ✅ | ✅ | ✅ | Partial | Partial | ✅ | ✅ |
| **No framework dependency** | ✅ | — | — | ❌ | ❌ | ❌ | — |
| **Designed as infrastructure** | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |

**MCP native** means it was built for MCP from day one — not bolted on after. The 4 tools (`search_memory`, `list_sources`, `get_document`, `index_now`) have a stable API that will not break across versions.

**Designed as infrastructure** means the architecture anticipates multi-agent access, per-agent permissions, and hardware agents (ESP32, Arduino) querying the same hub. See [Roadmap](#roadmap).

---

## Status

| Feature | Status |
|---|---|
| Local file indexing (txt, md, code, pdf, docx) | ✅ |
| Hybrid search — dense + BM25 + RRF | ✅ |
| MCP server — 4 tools, stdio + streamable-http | ✅ |
| Real-time incremental indexing (watchdog + debounce) | ✅ |
| Tree-sitter code chunking (Python, JS, TS, Go, Rust…) | ✅ |
| Cross-platform — Windows / Linux / macOS | ✅ |
| Post-crash reconciliation | ✅ |
| Optional OCR for scanned PDFs (Tesseract / EasyOCR) | ✅ |
| Privacy audit log (query hashes only, no cleartext) | ✅ |
| 172 tests — unit + integration | ✅ |
| Parent Document Retriever (`extended_preview`) | 🔜 v0.2 |
| GitHub Actions CI (Ubuntu / Windows / macOS) | 🔜 v0.2 |
| Docker + docker-compose | 🔜 v0.2 |
| Cross-encoder reranker | 🔜 v0.3 |
| Evaluation framework (Precision@k, MRR, NDCG) | 🔜 v0.3 |
| RAG with local LLM (Ollama) | 🔜 v0.4 |
| Email / Calendar / Browser sources | 🔜 v0.4 |
| Per-agent permission layer | 🔜 v0.5 |

---

## Quickstart

> **Prerequisite:** Python 3.11+ and [`uv`](https://github.com/astral-sh/uv).

```bash
# Install from PyPI
pip install memorymesh-mcp
```

Or clone for development:

```bash
# Clone and install
git clone https://github.com/kilhubprojects/memory-mesh.git
cd memory-mesh
uv sync

# Initialize state directory and copy example config
uv run memorymesh init

# Edit config.yaml — point it at the folders you want indexed
# (see Configuration section below)

# Index a folder
uv run memorymesh index ~/Documents

# Test a search
uv run memorymesh search "how did I configure the debounce"
```

### Run as daemon (real-time indexing)

```bash
uv run memorymesh start --transport streamable-http --detach
uv run memorymesh status
# edit a file in one of your sources — it gets indexed within ~2s
uv run memorymesh search "the sentence you just typed"
uv run memorymesh stop
```

### Wire it into Claude Desktop

Add to your Claude Desktop config:

- **macOS:** `~/Library/Application Support/Claude/claude_desktop_config.json`
- **Windows:** `%APPDATA%\Claude\claude_desktop_config.json`
- **Linux:** `~/.config/Claude/claude_desktop_config.json`

```json
{
  "mcpServers": {
    "memorymesh": {
      "command": "uv",
      "args": [
        "run",
        "--directory", "/absolute/path/to/memory-mesh",
        "memorymesh", "serve", "--stdio"
      ]
    }
  }
}
```

Restart Claude Desktop. The four tools appear automatically.

---

## MCP Tools

| Tool | Description |
|---|---|
| `search_memory(query, top_k, mode, source)` | Hybrid search over all indexed content. Returns path, preview, score, file type, and source. |
| `list_sources()` | List all configured sources with file counts and index status. |
| `get_document(path, max_bytes)` | Read the full content of an indexed file (up to 1 MB by default). |
| `index_now(path)` | Force immediate re-index of a file or directory, bypassing the watcher. |

All tools are backwards-compatible. The v0.1 signatures are frozen — adding `extended_preview` in v0.2 is additive, not breaking.

---

## Configuration

Everything lives in `config.yaml`. See [`config.example.yaml`](./config.example.yaml) for a fully commented reference. Key highlights:

```yaml
sources:
  - name: documents
    path: ~/Documents
    recursive: true
    extensions: [.txt, .md, .pdf, .docx]

  - name: projects
    path: ~/Projects
    recursive: true
    extensions: [.py, .js, .ts, .go, .rs, .md]

embeddings:
  model: all-MiniLM-L6-v2   # swap to paraphrase-multilingual-MiniLM-L12-v2 for PT/EN

search:
  mode: hybrid               # hybrid | dense | sparse
  top_k: 10

server:
  transport: stdio           # stdio | streamable-http
```

**Global ignore list** protects sensitive paths by default: `.env`, `*.key`, `id_rsa*`, `secrets/`, `.ssh/`, `.aws/`, `.git/`, `node_modules/`.

---

## Benchmarks

> Benchmarks will be published here after v0.2 lands CI across all three platforms. The goal is reproducible numbers — not "fast on my machine."

Scripts are already in `benchmarks/` and runnable locally:

- `bench_indexing.py` — indexing throughput (chunks/s, MB/s) on a synthetic corpus
- `bench_search_latency.py` — p50/p95/p99 search latency across hybrid/dense/sparse modes
- `bench_embedding_models.py` — speed vs. quality comparison across three embedding models

---

## Privacy & security

Three hard commitments that do not change across versions:

1. **No data leaves your machine.** No telemetry. No external API calls unless you explicitly opt in — and even then, there is a `WARNING` in the log.
2. **HTTP listener binds to `127.0.0.1` only** by default. Exposing to other interfaces requires an explicit config override.
3. **Logs never contain document content or queries in cleartext.** The audit log records query *hashes*, not queries.

Encryption at rest is on the [roadmap](#roadmap). If your disk is encrypted at the OS level, you are covered for the threat model MemoryMesh is designed against.

---

## Roadmap

| Version | Focus | ETA |
|---|---|---|
| **v0.2** | Security hardening + CI/CD + Parent Document Retriever | soon |
| **v0.3** | Eval framework (Precision@k, MRR) + reranker + query expansion | — |
| **v0.4** | Local LLM via Ollama (full RAG) + email/calendar sources | — |
| **v0.5** | Per-agent permissions + hierarchical memory (hot/warm/cold) | — |
| **v1.0** | Agent OS integration — memory layer for multi-agent systems | ~6 months |
| **v2.0** | Hardware agents — ESP32/Arduino querying the hub over BLE/WiFi | ~12 months |

Full details in [`ROADMAP.md`](./ROADMAP.md).

---

## Troubleshooting

- **`UnicodeDecodeError` on a text file** — MemoryMesh tries UTF-8, UTF-8 BOM, cp1252, latin-1 in order. If a file still fails, it is logged and skipped, not crashed.
- **Watcher doesn't fire on a network drive / WSL mount** — set `watcher.use_polling: true` in `config.yaml`.
- **Tesseract not found** — install it system-wide and ensure it is in `PATH`. Windows: [UB-Mannheim installer](https://github.com/UB-Mannheim/tesseract/wiki).
- **Embedding model mismatch after changing config** — run `memorymesh reindex --all`. The CLI refuses to start if the model ID stored in ChromaDB does not match the config.

---

## About this project

MemoryMesh is a solo project by **Carlos**, a high school student (3rd year, STEM) from Brazil, aiming for mechanical engineering at MIT.

It was built using **vibe coding** — writing code in tight collaboration with LLMs at high speed — with structured architectural reviews at each phase. The process: LLM proposes code, architect reviews for correctness, design gaps, and spec violations, test suite confirms. Bugs that slipped through (startup order in the reconciliation system, BM25 encapsulation violation, wrong constructor kwargs in the CLI) were caught in review before they ever ran in production.

This is what vibe coding looks like when you take the review step seriously: a 172-test suite, a real hybrid search pipeline, a reconciliation system, and an architecture designed to carry forward into an Agent OS — built by one person, in high school, in a few weeks.

Other projects by Carlos: a J.A.R.V.I.S.-style voice assistant, a robot with hybrid AI (PC + Arduino + micro:bit via Bluetooth), and a trading simulator with RandomForest + PyQt5.

---

## Contributing

MemoryMesh is not yet accepting external contributions — there is no CI or contribution guide in place yet. This changes in v0.2. Watch the repo or check back then.

---

## License

MIT. See [`LICENSE`](./LICENSE).

---

## Acknowledgements

Architecture informed by studying [LlamaIndex](https://github.com/run-llama/llama_index), [LangChain](https://github.com/langchain-ai/langchain), [PrivateGPT](https://github.com/zylon-ai/private-gpt), [AnythingLLM](https://github.com/Mintplex-Labs/anything-llm), [MemGPT](https://github.com/cpacker/MemGPT), and [Haystack](https://github.com/deepset-ai/haystack) — understanding what each does well and what it does not. And to [chroma-mcp](https://github.com/chroma-core/chroma-mcp) and the [MCP Python SDK](https://github.com/modelcontextprotocol/python-sdk) for showing what MCP-native looks like in practice.
