Metadata-Version: 2.4
Name: dev-trajectory
Version: 0.1.1
Summary: Local, single-user analysis of Claude Code session histories — surfaces engineering trajectory over time per domain, with evidence on hover.
Project-URL: Homepage, https://github.com/NuoWenLei/eng-trajectory-analyzer
Project-URL: Repository, https://github.com/NuoWenLei/eng-trajectory-analyzer
Project-URL: Issues, https://github.com/NuoWenLei/eng-trajectory-analyzer/issues
Project-URL: Changelog, https://github.com/NuoWenLei/eng-trajectory-analyzer/blob/main/CHANGELOG.md
Author: dev-trajectory contributors
License: MIT
License-File: LICENSE
Keywords: claude-code,developer-productivity,engineering-analytics
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.11
Requires-Dist: anthropic>=0.40.0
Requires-Dist: fastapi>=0.110
Requires-Dist: platformdirs>=4.2.0
Requires-Dist: plotly>=5.18
Requires-Dist: pydantic>=2.7.0
Requires-Dist: pyyaml>=6.0.1
Requires-Dist: typer>=0.12.0
Requires-Dist: uvicorn[standard]>=0.27
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == 'dev'
Description-Content-Type: text/markdown

# dev-trajectory

A local, single-user CLI that ingests your Claude Code session histories, scores each session along five engineering capability dimensions using an LLM judge, and renders your trajectory over time as a per-domain or per-language summary with evidence quotes.

> **Status: alpha.** Phases 0–2 of the [project plan](https://github.com/NuoWenLei/eng-trajectory-analyzer/blob/main/docs/PROJECT.md) are complete — discover → parse → sample → score → aggregate, plus a local browser dashboard. Phase 4 (MCP server) and Phase 5 (Claude skill) are not built yet.

## What it does

Claude Code writes JSONL transcripts of every session you run. They sit in `~/.claude/projects/` and are read by no one. Across hundreds of sessions, patterns aggregate: how you frame ambiguous tasks, whether you catch the model's mistakes, how you redirect when an approach is going wrong. `dev-trajectory` summarizes each session, scores it, and aggregates weekly so you can see where you're operating like a senior engineer and where you're still learning.

![dev-trajectory dashboard — per-domain scatter with best-fit lines and an evidence panel showing verbatim user-turn quotes from a clicked session](https://raw.githubusercontent.com/NuoWenLei/eng-trajectory-analyzer/main/docs/images/dev-traj-dashboard.png)

*The local dashboard (`devtraj serve`): per-session scatter with per-domain best-fit lines on the left, evidence quotes from the clicked session on the right.*

## Privacy

- **Local-only storage.** Sessions and scores live in a SQLite file under your user data directory. Nothing is uploaded.
- **Your own API key.** Scoring uses your Anthropic API key. No proxying.
- **Zero telemetry in this build.** A consent flag and toggle command exist (`devtraj telemetry on|off|status`) for a planned future opt-in aggregate study, but the uploader is hard-disabled — no endpoint is wired up.
- **No automatic redaction.** Evidence quotes are stored verbatim because they're what makes the dashboard credible. Since data never leaves the machine, this is a deliberate trade-off rather than a leak risk. See [docs/PROJECT.md §10](https://github.com/NuoWenLei/eng-trajectory-analyzer/blob/main/docs/PROJECT.md#10-scope-decisions--non-goals) for the full posture.

## What this can't tell you

These shape the architecture, not just the README — read them before drawing strong conclusions from the output:

- **You only use AI for things you don't already know.** This tool measures your behavior on those topics, which are usually your blind spots. A staff backend engineer learning Rust will look junior in their Rust sessions. The score reflects growth areas, not absolute capability.
- **LLM judges drift.** Single-shot scoring on subjective work has real variance. Median-of-3 voting, forced verbatim evidence quotes, and a fixed anchored rubric narrow the noise but don't eliminate it. Treat individual session scores as soft; trust trends over many weeks.
- **A flat or declining line is real information you didn't ask for.** During a hard quarter the chart will reflect that. The dashboard slices per-domain so a slump in one area doesn't drown out growth elsewhere — but the signal is honest, not encouraging.

See [docs/PROJECT.md §2](https://github.com/NuoWenLei/eng-trajectory-analyzer/blob/main/docs/PROJECT.md#2-honest-caveats-up-front) for the longer treatment.

## Install

```bash
pipx install dev-trajectory
export ANTHROPIC_API_KEY=sk-ant-...
```

Requires Python 3.11+ and an [Anthropic API key](https://console.anthropic.com/settings/keys). To persist the key instead of exporting it each session, run `devtraj init`. Plain `pip install dev-trajectory` inside a venv works too. From source: see [CONTRIBUTING.md](https://github.com/NuoWenLei/eng-trajectory-analyzer/blob/main/CONTRIBUTING.md).

## Quickstart

After install + API key setup:

```bash
devtraj analyze                     # summarize + score new sessions
devtraj report                      # per-week trajectory by domain (text)
devtraj report --slice-by language  # alternate slice
devtraj explain 2026-04-13          # evidence quotes for a given week
devtraj serve                       # interactive dashboard at http://127.0.0.1:8765/
```

By default `analyze` samples up to **3 sessions per ISO week** to control cost. Sampling is deterministic — re-runs pick the same sessions every time, so re-running `analyze` is safely idempotent. Set `sessions_per_week = 0` in config (or pass `--sessions-per-week 0`) to score every session.

Sessions are processed **in parallel** (default 10 at a time). Inside each session the median-of-3 scoring runs 3 calls in parallel, so total in-flight LLM calls is roughly `max_concurrent_sessions * 4`. Override with `--max-concurrent-sessions N`, or set `max_concurrent_sessions` in config.

## CLI reference

| Command | What it does |
| --- | --- |
| `devtraj init` | Detect Claude Code session paths, write a default config, verify the rubric loads, and surface the telemetry toggle. |
| `devtraj analyze` | Walk new sessions, summarize each (Haiku 4.5), score 3× with median voting (Sonnet 4.6), persist to SQLite. Skips sessions already scored at the current rubric version. |
| `devtraj analyze --rebuild` | Force re-summary and re-score for every sampled session. |
| `devtraj analyze --sessions-per-week N` | Override the per-week sampling cap for this run (`0` disables). |
| `devtraj analyze --max-concurrent-sessions N` | Override how many sessions are processed in parallel (default 10). |
| `devtraj report [--slice-by domain\|language\|all]` | Print a weekly median table per slice. |
| `devtraj report --min-confidence 0.7` | Drop scores below the given confidence floor. |
| `devtraj explain YYYY-MM-DD` | Show evidence quotes for the ISO week starting on that date. |
| `devtraj rescore` | Re-score sessions whose stored rubric version is older than the current one. |
| `devtraj serve [--port N] [--no-open]` | Launch the local dashboard. Two chart modes: weekly-median lines per domain, and a per-session scatter with per-domain best-fit. Click any point or bucket to load evidence quotes in the side panel; expand a session card to see the full user-turn text. Default `127.0.0.1:8765`. |
| `devtraj telemetry status` | Show the local consent flag (uploader is disabled regardless). |
| `devtraj paths` | Print the config and database paths. |
| `devtraj rubric [--prompt]` | Show the active rubric (version, source, bands, dimensions). With `--prompt`, dump the full system prompt sent to the judge. |

## How scoring works

Five dimensions, each scored 1 (junior) to 5 (principal):

- `problem_decomposition` — how well you break ambiguous tasks into structured sub-problems.
- `systems_thinking` — whether you reason about effects beyond the immediate change.
- `debugging_methodology` — hypothesis quality, use of evidence, willingness to bisect.
- `technical_communication` — precision of language; whether you give the model what it needs to answer well.
- `model_redirection` — whether you catch and correct the model when it's wrong. **Weighted most heavily** because it's the dimension least confounded by topic familiarity.

Each session also gets a `seniority_band` (junior / mid / senior / staff / principal — the primary surfaced signal), an `inferred_ic_level` (IC2–IC6), a `domain_category`, a `language`, and 3–5 verbatim evidence quotes. See [docs/PROJECT.md §4.4](https://github.com/NuoWenLei/eng-trajectory-analyzer/blob/main/docs/PROJECT.md#44-score) for the full schema.

The rubric anchors are paraphrased from CircleCI's published engineering competency matrix. The exact rendered system prompt lives in [`src/dev_trajectory/rubric/v1.yaml`](https://github.com/NuoWenLei/eng-trajectory-analyzer/blob/main/src/dev_trajectory/rubric/v1.yaml) — edit it and bump the version to iterate.

## Configuration

Config lives at `~/.config/dev-trajectory/config.toml` (Linux/macOS) or the platform equivalent. Run `devtraj paths` to see the exact location.

```toml
api_key = "sk-ant-..."          # optional; ANTHROPIC_API_KEY env var takes precedence
rubric_version = "v1"
min_confidence = 0.5            # scores below this are hidden from the report by default
tool_output_token_limit = 2000  # tool outputs longer than this are stripped from the user-turn extraction
sessions_per_week = 3           # 0 = no cap (score every discovered session)
max_concurrent_sessions = 10    # how many sessions to process in parallel
telemetry_enabled = false
custom_session_paths = []       # extra directories to scan for *.jsonl beyond ~/.claude/projects/
```

## Cost

A year of heavy Claude Code use is roughly 500–2000 sessions. With prompt caching on the rubric system prompt, median-of-3 scoring, and the default 3/week sampling cap, total spend on a fresh run is single-digit dollars. Subsequent runs are effectively free since they only score new weeks.

## Roadmap & contributing

See [docs/ROADMAP.md](https://github.com/NuoWenLei/eng-trajectory-analyzer/blob/main/docs/ROADMAP.md) for what's planned and roughly when. Suggestions to reprioritize, add, or drop items are welcome — open an issue or a PR against the roadmap itself.

For everything else (bug reports, rubric tweaks, new tool support, dashboard polish), see [CONTRIBUTING.md](https://github.com/NuoWenLei/eng-trajectory-analyzer/blob/main/CONTRIBUTING.md).

## License

MIT. See [LICENSE](https://github.com/NuoWenLei/eng-trajectory-analyzer/blob/main/LICENSE).
