Metadata-Version: 2.4
Name: chatstrata
Version: 0.1.0
Summary: A personal, queryable archive of your AI conversations across providers
Project-URL: Homepage, https://github.com/brandonbosch/chatstrata
Project-URL: Repository, https://github.com/brandonbosch/chatstrata
Project-URL: Issues, https://github.com/brandonbosch/chatstrata/issues
Project-URL: Documentation, https://github.com/brandonbosch/chatstrata#readme
Author: chatstrata contributors
License: Apache-2.0
License-File: LICENSE
Keywords: archive,chat,chatgpt,claude,duckdb,llm
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.10
Requires-Dist: click>=8.1.0
Requires-Dist: duckdb>=1.0.0
Requires-Dist: platformdirs>=4.0
Requires-Dist: pydantic>=2.0
Requires-Dist: pytz>=2024.1
Provides-Extra: dev
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Provides-Extra: embeddings
Requires-Dist: sentence-transformers>=3.0; extra == 'embeddings'
Requires-Dist: torch>=2.0; extra == 'embeddings'
Provides-Extra: mcp
Requires-Dist: mcp[cli]>=1.12.0; extra == 'mcp'
Provides-Extra: redact
Requires-Dist: presidio-analyzer>=2.2; extra == 'redact'
Requires-Dist: presidio-anonymizer>=2.2; extra == 'redact'
Requires-Dist: spacy>=3.7; extra == 'redact'
Description-Content-Type: text/markdown

# chatstrata

<p align="center">
  <img src="docs/images/chatstrata.png" alt="chatstrata" width="360">
</p>

A personal, queryable archive of your AI conversations across providers.

Every conversation you've had with Claude, ChatGPT, or any other LLM is a record of
how you think, what you're working on, and how that's changed over time. Most of
that record lives scattered across browser exports, hidden JSONL files, and SaaS
dashboards you don't fully control. chatstrata pulls it into one place, normalizes
it, and lets you actually query and analyze it.

The name is from "strata" — layers of conversation deposited over time, with the
deeper layers telling you who you were.

## Why this exists

LLM providers collect rich data about how you interact with their models and use
it (in aggregate) to improve the experience for everyone. chatstrata is the same
idea, but for an audience of one: **you**. Your conversations, on your machine,
queryable on your terms.

Concretely, with chatstrata you can:

- Find every conversation where you discussed a topic, across providers.
- See how your prompting has changed over months or years.
- Audit every bash command you ran through Claude Code, grouped by project.
- Build a corpus that helps you brief a new model on who you are and what you care about.
- Identify abandoned projects, dropped threads, recurring patterns.

## Status

**Early alpha.** v0 includes adapters for Claude Code, claude.ai exports, Codex
CLI, and OpenCode. The architecture is built so that adding more sources
(ChatGPT exports, Cursor, etc.) is the work of one adapter — see
[docs/adapter-guide.md](docs/adapter-guide.md).

## Quickstart

Requires Python 3.10+. DuckDB is installed as a Python dependency; you do not
need to install a separate DuckDB server or CLI.

```bash
uv tool install chatstrata
# or: pipx install chatstrata

# Create the local DuckDB archive and show detected sources
chatstrata init

# Ingest your Claude Code transcripts
chatstrata ingest claude_code --incremental

# See what's there
chatstrata stats

# Run a query
chatstrata query "SELECT model, COUNT(*) FROM messages GROUP BY model"
```

The default database lives at a platform-appropriate user data directory
(e.g. `~/.local/share/chatstrata/chatstrata.duckdb` on Linux). Override with
`CHATSTRATA_DB` or `--db`. Run `chatstrata paths` to see the exact paths for
your machine.

## MCP server

chatstrata ships an [MCP](https://modelcontextprotocol.io) server that exposes
your archive to MCP-aware clients (Claude Desktop, etc.) through a single
read-only `query` tool plus a `chatstrata://schema` resource. The client can
then write and run SQL against your conversations directly.

### 1. Install with MCP support

```bash
uv tool install "chatstrata[mcp]"
# or: pipx install "chatstrata[mcp]"
```

### 2. Create and populate the archive

The MCP server reads an existing database; make sure you've ingested something
first:

```bash
chatstrata init
chatstrata ingest claude_code --incremental
chatstrata paths               # note the database path for the next step
```

### 3. Point your MCP client at chatstrata

The installed `chatstrata-mcp` executable speaks MCP over stdio. If you use
`uvx`, clients can run the published package without needing the absolute path
to that executable.

For Claude Code, run:

```bash
claude mcp add --transport stdio --scope user chatstrata -- uvx --from "chatstrata[mcp]" chatstrata-mcp
```

Or ask chatstrata to print the command:

```bash
chatstrata mcp config claude-code
```

For Claude Desktop, add an entry to its `mcpServers` config (Settings →
Developer → Edit Config):

```json
{
  "mcpServers": {
    "chatstrata": {
      "type": "stdio",
      "command": "uvx",
      "args": ["--from", "chatstrata[mcp]", "chatstrata-mcp"]
    }
  }
}
```

You can generate that JSON with:

```bash
chatstrata mcp config claude-desktop
```

If `CHATSTRATA_DB` is omitted, the server falls back to the default platform
path. To pin the MCP server to a specific database, pass `--db` when generating
the setup snippet:

```bash
chatstrata mcp config claude-desktop --db /absolute/path/to/chatstrata.duckdb
```

Restart the client. The `chatstrata` server should appear with a `query` tool
available; ask it something like "what topics have I discussed most this month?"
and it will query your archive.

## Data model

chatstrata normalizes every conversation into the same shape regardless of source:

- **conversations** — one per session/thread
- **messages** — one per turn (user, assistant, system)
- **content_blocks** — one per content unit within a message (text, tool_use, tool_result, thinking, attachment)
- **tool_calls** — denormalized view of tool_use blocks for easy querying
- **raw_events** — the source data, line-for-line, for re-parsing without re-ingestion

See [docs/schema.md](docs/schema.md) for the full schema.

## Adding a source

Each source (Claude Code, ChatGPT export, etc.) is an adapter that implements a
small protocol: `discover()` finds available conversations, `parse()` turns them
into the canonical record types. See [docs/adapter-guide.md](docs/adapter-guide.md)
for the worked example using Claude Code.

Adapters can be contributed as PRs to this repo or as standalone pip packages
that register via entry points.

## Privacy

Your data stays on your machine. chatstrata makes no network calls during
ingestion or querying. Semantic search can optionally use DuckDB's VSS
extension; chatstrata only installs that extension when
`CHATSTRATA_INSTALL_DUCKDB_VSS=1` is set.

If you want to share queries or notebooks publicly, an optional redaction layer
(`uv tool install "chatstrata[redact]"`) wraps Microsoft Presidio with
chatstrata-specific recognizers for API keys, file paths, and other things that
commonly appear in LLM transcripts. See [docs/redaction.md](docs/redaction.md).

## Contributing

Contributions welcome. Especially valuable: new source adapters. See
[CONTRIBUTING.md](CONTRIBUTING.md).

## License

Apache 2.0. See [LICENSE](LICENSE).
