Metadata-Version: 2.4
Name: dbt-pathfinder
Version: 0.4.1
Summary: CLI and library for exploring dbt manifests as graphs.
Project-URL: Homepage, https://github.com/jon-woodland/dbt-pathfinder
Project-URL: Repository, https://github.com/jon-woodland/dbt-pathfinder
Project-URL: Issues, https://github.com/jon-woodland/dbt-pathfinder/issues
License: MIT
License-File: LICENSE
Keywords: ci,dag,data-engineering,dbt,impact-analysis,lineage,manifest
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Quality Assurance
Requires-Python: >=3.9
Requires-Dist: networkx>=3.0
Requires-Dist: pydantic>=2.0
Requires-Dist: rich>=13.0
Requires-Dist: tomli>=2.0.1; python_version < '3.11'
Requires-Dist: typer>=0.9
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == 'dev'
Description-Content-Type: text/markdown

# dbt-pathfinder

**dbt-pathfinder** is a Python CLI and library that reads dbt’s `manifest.json`, builds a directed graph of your project, and helps you explore lineage, estimate change impact, and reason about joins between models.

Typical questions it supports:

- What might break if I change this model?
- Which nodes and tests are affected by these changed files?
- How are two models connected in the DAG?
- What joins appear along a path, and what cardinality is suggested?

---

## Motivation

dbt documents lineage in the manifest, but day-to-day work often needs **interactive exploration**: tracing paths, summarizing downstream impact for a PR, and inferring join relationships from SQL and tests. This tool focuses on fast, manifest-driven analysis for developers and CI pipelines.

---

## Commands

### `show`

Inspect one node: metadata, upstream dependencies, and immediate downstream children.

```bash
dbt-pathfinder show --manifest target/manifest.json --model fct_orders
```

---

### `impact`

List everything downstream of a model, grouped by graph distance (depth).

```bash
dbt-pathfinder impact --manifest target/manifest.json --model stg_orders
```

Example:

```
Depth 1:
- fct_orders
- dim_customers

Depth 2:
- mart_customer_ltv
```

Useful for impact review, refactors, and validating scope before merging.

---

### `ci-impact`

Given a list of changed project files (paths as in the manifest’s `original_file_path`), report downstream nodes, impacted tests, optional column-level hints, and a suggested `dbt build --select` command. Intended for CI/CD (for example, files touched in a pull request).

```bash
dbt-pathfinder ci-impact \
  --manifest target/manifest.json \
  --files models/staging/stg_orders.sql \
  --files models/marts/dim_customers.sql
```

Example (text):

```
Changed files:
- models/staging/stg_orders.sql
- models/marts/dim_customers.sql

Changed nodes:
- stg_orders
- dim_customers

Impacted downstream:
Depth 1:
- fct_orders
- mart_customer_ltv

Depth 2:
- customer_health_dashboard

Impacted tests:
- unique_fct_orders_order_id
- not_null_dim_customers_customer_id

Suggested command:
dbt build --select stg_orders+ dim_customers+
```

Options:

- `--output text` (default) or `--output json` (JSON uses node `unique_id` values in `impacted_by_depth`; text uses short labels)
- `--ui rich` (default) or `--ui text` when using `--output text` (JSON is always plain text)
- `--include-tests` (default) or `--no-include-tests`
- `--include-columns` / `--no-include-columns` (default off unless set in config): best-effort downstream column impact via SQL/ref parsing

Repeat `--files` once per path. Matching uses each node’s `original_file_path` (`.sql` and `.yml` / `.yaml`).

Column impact example:

```bash
dbt-pathfinder ci-impact \
  --manifest target/manifest.json \
  --files models/staging/stg_orders.sql \
  --include-columns \
  --ui text
```

For a **single JSON schema** across model, file list, or git range, use `impact-report` below.

---

### `impact-report` (unified)

One entrypoint for impact analysis. Supply **exactly one** of:

- `--model` — node name or `unique_id`
- `--files` — repeat per changed path (same semantics as `ci-impact`)
- `--git-diff RANGE` — e.g. `main...HEAD` (dbt-relevant files only)

The result is always an **`ImpactReport`**: changed files/nodes, downstream by depth, optional columns and tests, plus suggested `dbt build` and `dbt test` commands.

```bash
dbt-pathfinder impact-report \
  --manifest target/manifest.json \
  --git-diff main...HEAD \
  --repo-root .

dbt-pathfinder impact-report \
  --manifest target/manifest.json \
  --files models/staging/stg_orders.sql

dbt-pathfinder impact-report \
  --manifest target/manifest.json \
  --model stg_orders
```

**`--format`**

| Value | Use case |
| --- | --- |
| `text` (default) | Terminal; pair with `--ui rich` or `--ui text` |
| `json` | Stable `ImpactReport` JSON for automation |
| `github` | Markdown suited for pull request comments |

In GitHub Actions, a common range is  
`${{ github.event.pull_request.base.sha }}...${{ github.event.pull_request.head.sha }}`:

```bash
dbt-pathfinder impact-report \
  --manifest target/manifest.json \
  --git-diff "$BASE_SHA...$HEAD_SHA" \
  --format github \
  --include-columns \
  > impact-report.md
```

This repository includes [`.github/workflows/pr-impact-report.yml`](.github/workflows/pr-impact-report.yml), which posts a sticky PR comment using [sticky-pull-request-comment](https://github.com/marocchino/sticky-pull-request-comment). The workflow demo uses `tests/fixtures/manifest.json`; in a real dbt repo, set `DBT_MANIFEST_PATH` to `target/manifest.json` and run `dbt parse` or `dbt compile` before the report step.

For gating on risk, run `pr-risk --output json` in a separate job and fail the job when `score` meets your policy (for example with `jq`).

---

### `pr-risk`

Heuristic **risk score** for a change set using the manifest DAG: downstream breadth and depth, exposures depending on impacted nodes, model fan-out, join hints from `path-explain`, columns missing uniqueness tests in the manifest, and optional **critical tags** (from node `tags`). Scoring uses **node-level** lineage; column heuristics belong in `impact-report --include-columns`, not inside `pr-risk`.

```bash
dbt-pathfinder pr-risk \
  --manifest target/manifest.json \
  --git-diff main...HEAD \
  --repo-root .
```

Or with explicit files:

```bash
dbt-pathfinder pr-risk \
  --manifest target/manifest.json \
  --files models/staging/stg_orders.sql
```

Example (text):

```
PR Risk Score: High

Reasons:
- Scope: node-level DAG lineage; column-to-column propagation not analyzed (column uniqueness test gaps are still counted).
- 12 downstream nodes impacted
- Impact reaches depth 3
- 2 exposures depend on nodes in the impacted subgraph
- 1 model in the impacted subgraph has 2+ direct downstream models (DAG fan-out)
- 5 model columns lack a uniqueness test (manifest-derived)
- 1 direct downstream model joins have unknown cardinality (path-explain heuristic)
- affects tag: finance_critical

Suggested commands (impacted subgraph):
dbt build --select stg_orders+
dbt test --select stg_orders+
```

Options:

- `--git-diff RANGE` **or** one or more `--files` (mutually exclusive)
- `--repo-root PATH` with `--git-diff` (default: current directory)
- `--critical-tag NAME` (repeatable); merged with `critical_tags` from config when present
- `--output text|json`, `--ui rich|text` (for text output)

---

### `path`

Shortest path between two nodes.

```bash
dbt-pathfinder path \
  --manifest target/manifest.json \
  --from stg_orders \
  --to mart_customer_ltv
```

- `--mode directed` (default) or `--mode any`

---

### `path-explain`

Explain how two models are connected along a path: inferred join conditions, keys, suggested cardinality (`1:1`, `1:N`, `N:1`, `N:N`), and a confidence indication. Inference is **heuristic** and depends on SQL shape and dbt tests (`unique`, `not_null`, etc.).

```bash
dbt-pathfinder path-explain \
  --manifest target/manifest.json \
  --from stg_orders \
  --to mart_customer_ltv
```

---

## Installation

**From PyPI**

```bash
pip install dbt-pathfinder
```

**From source (development)**

```bash
git clone https://github.com/jon-woodland/dbt-pathfinder.git
cd dbt-pathfinder
pip install -e ".[dev]"
```

---

## Quick start

1. Produce a manifest:

```bash
dbt compile
```

2. Run:

```bash
dbt-pathfinder show --manifest target/manifest.json --model my_model
```

---

## CLI reference

```bash
dbt-pathfinder --help
```

Further examples:

```bash
dbt-pathfinder show --manifest target/manifest.json --model fct_orders --verbose

dbt-pathfinder impact --manifest target/manifest.json --model stg_orders

dbt-pathfinder impact --manifest target/manifest.json --model stg_orders --output json

dbt-pathfinder ci-impact \
  --manifest target/manifest.json \
  --files models/staging/stg_orders.sql \
  --files models/marts/dim_customers.sql

dbt-pathfinder ci-impact --manifest target/manifest.json --files models/staging/stg_orders.sql --output json

dbt-pathfinder ci-impact --manifest target/manifest.json --files models/staging/stg_orders.sql --include-columns --ui text

dbt-pathfinder impact-report --manifest target/manifest.json --model stg_orders --format json

dbt-pathfinder impact-report --manifest target/manifest.json --git-diff main...HEAD --format github --include-columns

dbt-pathfinder pr-risk --manifest target/manifest.json --git-diff main...HEAD

dbt-pathfinder doctor --manifest target/manifest.json

dbt-pathfinder path --manifest target/manifest.json --from stg_orders --to mart_customer_ltv

dbt-pathfinder path-explain --manifest target/manifest.json --from stg_orders --to mart_customer_ltv --ui rich

dbt-pathfinder json-schema --which all > schemas/dbt-pathfinder.json
```

---

## Output modes

- `--ui rich` (default): formatted terminal output for commands that support it.
- `--ui text`: plain text.
- `--output json`: machine-readable JSON on `impact`, `path-explain`, `ci-impact`, and `pr-risk`.
- `impact-report --format json|github`: unified schema or PR-comment Markdown.
- `json-schema --which ci-impact|impact-report|pr-risk|all`: [JSON Schema](https://json-schema.org/) for JSON payloads (codegen and validation).

---

## Configuration

### `.dbt-pathfinder.toml`

Optional repository defaults. The CLI searches upward from the current working directory for `.dbt-pathfinder.toml` or `dbt-pathfinder.toml`. Override with `--config PATH` or `DBT_PATHFINDER_CONFIG`.

Settings apply to **`ci-impact`**, **`impact-report`**, and **`pr-risk`** where relevant. CLI flags override the file when both are set.

```toml
critical_tags = ["finance_critical"]

fanout_model_successors_threshold = 2
exposure_points_multiplier = 3
exposure_points_cap = 12

column_mode_default = false

[pathfinder]
column_mode_default = true

git_diff_ignore_prefixes = ["snapshots/"]

[git_diff]
ignore_globs = ["*.md"]
ignore_prefixes = ["macros/"]
```

### `doctor`

Validates Python, `git`, optional config, and manifest/graph statistics.

```bash
dbt-pathfinder doctor --manifest target/manifest.json
```

Without `--manifest`, tries `DBT_MANIFEST_PATH`, then `target/manifest.json`, then `./manifest.json`.

### Shell completions

```bash
dbt-pathfinder --install-completion
dbt-pathfinder --show-completion
```

---

## Library usage

```python
from dbt_pathfinder.services.pathfinder_service import PathfinderService
from dbt_pathfinder.services.ci_impact_service import CIImpactService
from dbt_pathfinder.services.pr_risk_service import PRRiskService

service = PathfinderService.from_manifest("target/manifest.json")

print(service.show("fct_orders"))
print(service.impact("stg_orders"))
print(service.path("stg_orders", "mart_customer_ltv", directed=True))
print(service.explain_path("stg_orders", "mart_customer_ltv", directed=True))

ci = CIImpactService.from_manifest("target/manifest.json")
result = ci.build_result(
    ["models/staging/stg_orders.sql"],
    include_tests=True,
    include_columns=True,
)
print(result.model_dump())

report = ci.build_impact_report(
    input_mode="files",
    files=["models/staging/stg_orders.sql"],
    include_tests=True,
    include_columns=True,
)
print(report.model_dump())

risk = PRRiskService.from_manifest("target/manifest.json")
print(risk.analyze_files(["models/staging/stg_orders.sql"]).model_dump())
```

### Exported models and JSON Schema

Stable imports for integrations:

```python
from dbt_pathfinder import (
    CIImpactResult,
    ImpactReport,
    RiskResult,
    SCHEMA_VERSION,
    get_ci_impact_result_schema,
    get_impact_report_schema,
    get_risk_result_schema,
    get_all_schemas_bundle,
    dump_schemas_json,
)
```

- **`SCHEMA_VERSION`** — bump in `json_schemas.py` when JSON field names or shapes change incompatibly (note in release notes).
- Schemas use draft 2020-12 and `$id` URNs (`urn:dbt-pathfinder:schema:v1:…`).

```bash
dbt-pathfinder json-schema --which ci-impact
dbt-pathfinder json-schema --which all
```

---

## pre-commit

Hooks are defined in [`.pre-commit-hooks.yaml`](.pre-commit-hooks.yaml):

- **`dbt-pathfinder-ci-impact`** — `ci-impact --output json` on staged `.sql` / `.yml` / `.yaml`
- **`dbt-pathfinder-pr-risk`** — `pr-risk --output json` on the same files

| Variable | Purpose |
| --- | --- |
| `DBT_MANIFEST_PATH` | Path to `manifest.json` (default: `target/manifest.json`) |
| `DBT_PATHFINDER_CONFIG` | Path to `.dbt-pathfinder.toml` (optional; otherwise upward search from hook cwd) |
| `DBT_PATHFINDER_PR_RISK_FAIL_ON` | `high` or `medium` — fail the hook when `pr-risk` JSON `score` is at or above that level |

See [examples/pre-commit-config.yaml](examples/pre-commit-config.yaml); pin `rev` to a release tag in your `.pre-commit-config.yaml`.

---

## Package layout

The package separates **commands** (thin Typer handlers), **services** (graph and impact logic), **models** (Pydantic DTOs), **renderers** (text / Rich / Markdown), and **parser** / **graph** (manifest to `networkx` `DiGraph`).

```
dbt_pathfinder/
├── cli.py              # Typer entrypoint
├── cli_support.py      # shared --config option + shell completion
├── paths.py            # path normalization + dbt file extensions
├── settings.py         # .dbt-pathfinder.toml
├── git_changes.py      # git diff → changed paths
├── json_schemas.py     # JSON Schema for JSON outputs
├── commands/           # one module per subcommand
├── services/           # PathfinderService, CIImpactService, PRRiskService
├── models/             # manifest + report DTOs
├── renderers/          # text / Rich / GitHub Markdown
├── parser/             # manifest JSON → Manifest
├── graph/              # Manifest → DiGraph
└── hooks/              # pre-commit entry points
```

---

## How it works

- Load and validate dbt `manifest.json`.
- Build a directed graph: upstream → downstream.
- Use graph traversal for lineage and paths.
- Use SQL heuristics and dbt tests for join and cardinality hints.

---

## Limitations

- Join and cardinality inference are best-effort; complex SQL and macros reduce reliability.
- `ci-impact` and `impact-report --include-columns` use heuristic column tracing, not full SQL AST lineage.
- Requires a valid dbt `manifest.json`.
- Ambiguous model names may require a full `unique_id`.

---

## Contributing

Python **3.9+**. Install dev dependencies and run tests:

```bash
pip install -e ".[dev]"
python -m pytest tests/
```

Pull requests are welcome; include tests for behavior changes. For bugs and feature ideas, use [GitHub Issues](https://github.com/jon-woodland/dbt-pathfinder/issues).

---

## License

MIT License
