Metadata-Version: 2.4
Name: mbe-tools
Version: 0.4.0
Summary: A practical toolkit for Many-Body Expansion (MBE) workflows: cluster design, MBE input generation, output parsing, and analysis.
Author: Jiarui Wang
License: MIT
License-File: LICENSE
Requires-Python: >=3.10
Requires-Dist: numpy>=1.24
Requires-Dist: rich>=13.7
Requires-Dist: typer>=0.12
Provides-Extra: analysis
Requires-Dist: matplotlib>=3.8; extra == 'analysis'
Requires-Dist: openpyxl>=3.1; extra == 'analysis'
Requires-Dist: pandas>=2.0; extra == 'analysis'
Provides-Extra: cli
Provides-Extra: dev
Requires-Dist: build>=1.2; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Description-Content-Type: text/markdown

# mbe-tools

`mbe-tools` is a Python toolkit for the **Many-Body Expansion (MBE)** workflow:

- Cluster handling: read `.xyz`, fragment (water heuristic or connectivity + labels), and sample fragments (random/spatial, ion-aware).
- Job prep: generate subset geometries, render Q-Chem/ORCA inputs, and emit PBS/Slurm scripts (supports chunked submission with run-control).
- Parsing: read ORCA/Q-Chem outputs, auto-detect program, infer method/basis/grid metadata, emit JSONL.
- Analysis: inclusion–exclusion MBE(k), summaries, CSV/Excel export, and quick plots.

Status: **0.4.0 release** — backend syntax (e.g., ghost atoms) can be customized per site. License: **MIT**.

Architecture note: [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) describes the current CLI/core split. `mbe_tools.cli:app` owns the Typer command surface, `cli_commands.py` owns the named command target registry, and reusable command behavior lives in core modules such as `generation.py`, `input_builder.py`, `qchem_commands.py`, `parse.py`, and `jsonl_views.py`; `cli_*.py` modules are compatibility re-export shims. The machine-readable command surface is mirrored in [docs/cli_command_contract.json](docs/cli_command_contract.json). Release metadata and package artifact expectations are mirrored in [docs/release_contract.json](docs/release_contract.json).
For stable Python imports and owner-module guidance, see [docs/API_REFERENCE.md](docs/API_REFERENCE.md).

## Install (editable for development)

```bash
cd mbe-tools
python3 -m pip install -e ".[analysis,cli]"
```

The installed command is `mbe`. During development, the same Typer entrypoint
can also be checked with `python3 -m mbe_tools.cli --help`.
Dependency tiers and the CI/development install command are mirrored in
[docs/dependency_contract.json](docs/dependency_contract.json). For local
development, install with `python3 -m pip install -e ".[analysis,cli,dev]"`.
CI and PyPI publishing workflow invariants are mirrored in
[docs/workflow_contract.json](docs/workflow_contract.json).
Changelog/release-note structure is mirrored in
[docs/changelog_contract.json](docs/changelog_contract.json).
Bug and scientific-validation intake prompts are mirrored in
[docs/issue_template_contract.json](docs/issue_template_contract.json).
Test-suite taxonomy is documented in [docs/TESTING.md](docs/TESTING.md) and
mirrored in [docs/test_taxonomy_contract.json](docs/test_taxonomy_contract.json).
Internal documentation link coverage is mirrored in
[docs/docs_link_contract.json](docs/docs_link_contract.json).

## Quickstart (no external chemistry software)

The fastest way to verify the installed CLI is to use the committed synthetic
fixtures. They are small hand-written Q-Chem-like and ORCA-like outputs, so this
path does not require Q-Chem, ORCA, or an HPC scheduler:

```bash
python3 examples/synthetic/regenerate.py --check
python3 examples/synthetic/check_cli_workflow.py
```

The first command checks the committed expected JSONL/CSV artifacts. The second
command runs the public CLI workflow in a temporary directory: parse Q-Chem-like
and ORCA-like outputs, validate JSONL provenance/schema fields, inspect and
calculate strict two-body MBE output, and export analysis CSV files. A successful
run ends with `synthetic CLI workflow ok`.

## Settings precedence (P0)

Configure default commands/modules/scratch once and reuse across CLI calls. Precedence (low → high): env vars → `~/.config/mbe-tools/config.toml` → `./mbe.toml` → explicit `load_settings(path=...)`.

Keys: `qchem_command`, `orca_command`, `qchem_module`, `orca_module`, `scratch_dir`, `scheduler_queue`, `scheduler_partition`, `scheduler_account`.

Env map: `MBE_QCHEM_CMD`, `MBE_ORCA_CMD`, `MBE_QCHEM_MODULE`, `MBE_ORCA_MODULE`, `MBE_SCRATCH`, `MBE_SCHED_QUEUE`, `MBE_SCHED_PARTITION`, `MBE_SCHED_ACCOUNT`.

Minimal `mbe.toml` example:

```toml
qchem_command = "/opt/qchem/bin/qchem"
orca_command  = "/opt/orca/bin/orca"
qchem_module  = "qchem/6.2.2"
orca_module   = "orca/5.0.3"
scratch_dir   = "/scratch/${USER}"
scheduler_queue = "normal"
scheduler_partition = "work"
scheduler_account = "proj123"
```

## Quickstart (Python API)

Common workflow helpers are exported from `mbe_tools` for stable, concise
imports; domain modules such as `mbe_tools.cluster` and `mbe_tools.analysis`
remain available for advanced use.
The dedicated API reference lives in [docs/API_REFERENCE.md](docs/API_REFERENCE.md).

1) Fragment an XYZ

```python
from mbe_tools import read_xyz, fragment_by_water_heuristic, fragment_by_connectivity

xyz = read_xyz("Water20.xyz")
frags = fragment_by_water_heuristic(xyz, oh_cutoff=1.25)
frags_conn = fragment_by_connectivity(xyz, scale=1.2)
```

2) Sample and write XYZ

```python
from mbe_tools import sample_fragments, write_xyz

picked = sample_fragments(frags, n=10, seed=42)
write_xyz("Water10_sample.xyz", picked)
```

3) Generate subset geometries

```python
from mbe_tools import MBEParams, generate_subsets_xyz

params = MBEParams(max_order=3, cp_correction=True, backend="qchem")
subset_jobs = list(generate_subsets_xyz(frags, params))  # (job_id, subset_indices, geom_text)
```

4) Build inputs

```bash
mbe build-input water.geom --backend qchem --method wb97m-v --basis def2-ma-qzvpp --out water_qchem.inp
mbe build-input water.geom --backend orca  --method wb97m-v --basis def2-ma-qzvpp --out water_orca.inp
```

5) Emit PBS/Slurm templates (run-control included; PBS can local-run)

```bash
mbe template --scheduler pbs   --backend qchem --job-name mbe-qchem --chunk-size 20 --local-run --builtin-control --out qchem.run
mbe template --scheduler slurm --backend orca  --job-name mbe-orca  --partition work --chunk-size 10 --out orca.sbatch
```

6) Parse outputs to JSONL

```bash
mbe parse ./Output --program auto --glob "*.out" --jobs 4 --out parsed.jsonl
```

7) Analyze JSONL

```bash
mbe analyze parsed.jsonl --to-csv results.csv --to-xlsx results.xlsx --plot mbe.png
```

## CLI cheat sheet

- `mbe fragment <xyz>`: water-heuristic fragmentation + sampling → XYZ. Options: `--out-xyz [sample.xyz]`, `--n [10]`, `--seed`, `--require-ion`, `--mode [random|spatial]`, spatial extras `--prefer-special`, `--k-neighbors`, `--start-index`, `--oh-cutoff`.
- `mbe gen <xyz>`: generate subset geometries. Options: `--out-dir [mbe_geoms]`, `--max-order [2]`, `--order/--orders`, `--cp/--no-cp`, `--scheme`, `--backend [qchem|orca]`, `--cluster-name` (filename prefix, fallback to backend), `--oh-cutoff`; `--monomers-dir DIR` + `--monomer-glob "*.geom"` can also reuse monomer `.geom` files instead of fragmenting.
- `mbe gen-from-monomer <dir>`: generate subsets directly from existing monomer `.geom` files; options mirror `mbe gen` monomer mode: `--order/--orders`/`--max-order`, `--cp/--no-cp`, `--scheme`, `--backend`, `--monomer-glob`, `--out-dir`, `--cluster-name`. The old `gen_from_monomer` spelling remains accepted but is hidden from root help.
- `mbe build-input <geom>`: render Q-Chem/ORCA input. Options for backend, method, basis (required), charge/multiplicity, `--no-cp` (ignore ghost atoms in `.geom` when writing `.inp` geometry); Q-Chem adds `--thresh`, `--tole`, `--scf-convergence`, `--xc-grid`, `--rem-extra`, `--sym-ignore/--no-sym-ignore`, embeddings `--giee elem=charge` (repeatable per element) or `--gdee file` for `$external_charges`; ORCA adds `--grid`, `--scf-convergence`, `--keyword-line-extra`, and with `--giee/--gdee` writes a same-stem `.pc` file plus `%pointcharges "<name>.pc"` on line 2 of `.inp`; batch mode: point `geom` to a directory and add `--glob "*.geom" --out-dir outputs/` to render many at once.
- `mbe template`: PBS/Slurm scripts with run-control wrapper. Shared: `--scheduler [pbs|slurm]`, `--backend [qchem|orca]`, `--job-name`, `--walltime`, `--mem-gb`, `--chunk-size`, `--module`, `--command`, `--out`; PBS+qchem adds `--ncpus`, `--queue`, `--project`, `--local-run` (emit local bash runner), `--control-file` (external TOML), `--builtin-control` (write default control TOML); Slurm+orca adds `--ncpus` (cpus-per-task), `--ntasks`, `--partition`, `--project` (account), `--qos`; `--wrapper` emits a bash submitter (bash job.sh) that writes hidden `._*.pbs/.sbatch` and submits via qsub/sbatch.
- `mbe parse <root>`: outputs → JSONL. Options: `--program auto|qchem|q-chem|orca` (default qchem), `--glob-pattern`, `--jobs/-j` (parallel output parsing with deterministic output order), `--out`, `--summary-out`, `--resume`, `--infer-metadata`, geometry search controls (`--cluster-xyz`, `--geom-mode first|last`, `--geom-source singleton|any`, `--geom-max-lines`, `--geom-drop-ghost`, `--nosearch`). If no singleton metadata is available, it falls back to the first parsable geometry as monomer 0 for embedding.
  Q-Chem electrostatic-embedding outputs with `$external_charges` / external point-charge markers and `Charge-charge energy` are treated as EE-MBE: parsed SCF/total energies are corrected as `reported_energy - charge_charge_energy`, with the raw and correction terms preserved in `extra`.
- `mbe qchem-mbe [ORDER]`: Q-Chem batch post-processing (bashrc `MBE` equivalent). Options: `--specify/-s DIR[:n]` (repeatable, supports `ROOT`), `--exclude/-x DIR` (repeatable), `--force/-f`, `--root`, `--out-dir`. Outputs: `Result.csv`, `Energy.csv`, `deltaE.csv`, `WallTime.csv`, `CPUTime.csv`.
- `mbe qchem-mbe-cbs [ORDER]`: Q-Chem CBS-style batch post-processing (bashrc `MBE_CBS` equivalent). Same options as `qchem-mbe`; adds `Energy_SCF.csv`, `Energy_corr.csv`, and, when matching `cc-pVXZ`/`VXZ` or `aug-cc-pVXZ`/`aVXZ` basis pairs are detected, `CBS.csv` with SCF/corr/total CBS extrapolated values. For EE-MBE Q-Chem outputs, total-style and SCF energies are corrected by subtracting `Charge-charge energy`.
- `mbe energy-to-mbe <Energy.csv>`: rebuild `deltaE.csv` + `Result.csv` from an existing `Energy.csv`. Options: `--delta-out`, `--result-out`, `--max-order`, `--force`, `--strict-labels/--no-strict-labels`.
- `mbe script-library [SCRIPT]`: list, print, or write small helper scripts for common tasks. Current scripts: `parse-outs` (key-info CSV/JSONL from `.out`) and `mbe-energy` (parse `.out` and print strict MBE energies). Example: `mbe script-library mbe-energy --out mbe_energy.py`.
- `mbe cbs-exploration ...`: pair same-name `.out` files from two explicitly labelled basis directories and write `CBS.csv`. Supported explicit options: `--aVDZ/--aVTZ/--aVQZ` for augmented basis pairs and `--VDZ/--VTZ/--VQZ` for non-augmented pairs, e.g. `mbe cbs-exploration --aVDZ DIR1 --aVTZ DIR2 --save OUTDIR`. The basis labels come from CLI options, not directory-name guessing. EE-MBE `Charge-charge energy` correction is applied before SCF/CBS extrapolation. The old root-level `mbe --cbs-exploration ...` form remains accepted for existing scripts, but is hidden from root help.
- `mbe version` / `mbe --version`: print package, Python, and JSONL schema versions.
- `mbe where`: print default data/config/cache/state paths and the runs archive root.
- `mbe list-runs`: list saved-run archives from the configured library; options: `--dest`, `--cluster`, `--limit`, `--to-csv`, `--to-xlsx`, `--output-format [pipe|tsv]`, `--json`.
- `mbe analyze <parsed.jsonl>`: summaries/exports. Options: `--to-csv`, `--to-xlsx`, `--plot`, `--scheme [simple|strict]`, `--max-order`.
- `mbe show <jsonl>`: options: optional `JSONL_PATH` (uses default selection if omitted); `--monomer N` (0-based) to print monomer geometry and include it in participation/CPU summaries. Output includes cluster info, CPU totals, per-order energy stats, and strict inclusion–exclusion MBE(k) totals with per-order ΔE.
- `mbe validate <jsonl>`: validate JSONL schema consistency, subset index/order consistency, duplicate subset hints, coverage by order, provenance completeness, and mixed calculation labels; add `--json` for machine-readable output, `--require-energy` to fail missing energies, `--require-provenance` to fail missing provenance, and `--require-schema-version` to fail unversioned records.
- `mbe calc <jsonl>`: options: optional `JSONL_PATH`; `--scheme [simple|strict]` (default simple); `--to K` (upper order); `--from K0` (lower bound for ΔE K0→K); `--monomer N` (report monomer energy); `--unit [hartree|kcal|kj]` (default hartree); `--interaction i,j[,k]` (0-based, repeatable) to report subset interaction energy E(subset) − ΣE(monomers). Strict scheme uses inclusion–exclusion; simple scheme uses ΔE vs mean monomer.

Use `mbe <command> --help` for full flags.

## Definitions (CLI & API)

| Area | Item                         | What it does                                                                                                                                   | Key options/args                                                                                                                                                                                                                                                                                                                                                 | Notes                                                                                                                           | Implementation                                                                                                            |
| ---- | ---------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------- |
| CLI  | `mbe version`                | Print package, Python, and JSONL schema versions                                                                                               | _none_                                                                                                                                                                                                                                                                                                                                                           | Same report shape as `mbe --version`                                                                                            | [src/mbe_tools/version_info.py](src/mbe_tools/version_info.py)                                                            |
| CLI  | `mbe cbs-exploration`        | Pair same-name explicit-basis Q-Chem outputs and write EE-aware `CBS.csv`                                                                      | `--aVDZ/--aVTZ/--aVQZ` or `--VDZ/--VTZ/--VQZ`; generic `--aV-low BASIS:DIR`, `--aV-high BASIS:DIR`; required `--save`                                                                                                                                                                                                                                            | Basis labels come from CLI options; EE-MBE charge-charge correction is applied before CBS extrapolation                         | [src/mbe_tools/qchem_commands.py](src/mbe_tools/qchem_commands.py)                                                        |
| CLI  | `mbe fragment <xyz>`         | Water-heuristic fragmentation and sampling → XYZ                                                                                               | `--n`, `--seed`, `--mode random/spatial`, `--require-ion`, `--prefer-special`, `--k-neighbors`, `--start-index`, `--oh-cutoff`, `--out-xyz`                                                                                                                                                                                                                      | Spatial mode can force special fragment; writes sampled XYZ                                                                     | [src/mbe_tools/generation.py](src/mbe_tools/generation.py)                                                                |
| CLI  | `mbe gen <xyz>`              | Generate subset geometries up to chosen orders                                                                                                 | `--max-order` or repeatable `--order/--orders`, `--cp/--no-cp`, `--scheme`, `--backend qchem/orca`, `--oh-cutoff`, `--out-dir`                                                                                                                                                                                                                                   | Orders can be explicit list; CP toggles ghost atoms                                                                             | [src/mbe_tools/generation.py](src/mbe_tools/generation.py)                                                                |
| CLI  | `mbe gen-from-monomer <dir>` | Generate subset geometries directly from existing monomer `.geom` files                                                                       | `--max-order` or repeatable `--order/--orders`, `--cp/--no-cp`, `--scheme`, `--backend qchem/orca`, `--monomer-glob`, `--out-dir`, `--cluster-name`                                                                                                                                                                                                               | Public hyphenated spelling; legacy `gen_from_monomer` remains accepted but hidden                                               | [src/mbe_tools/generation.py](src/mbe_tools/generation.py)                                                                |
| CLI  | `mbe build-input <geom>`     | Render Q-Chem/ORCA input from .geom                                                                                                            | Required `--method`, `--basis`; shared `--charge`, `--multiplicity`, `--no-cp`; Q-Chem: `--thresh`, `--tole`, `--scf-convergence`, `--xc-grid`, `--rem-extra`, `--sym-ignore/--no-sym-ignore`, embedding via `--giee elem=charge` (repeatable) or `--gdee file`; ORCA: `--grid`, `--scf-convergence`, `--keyword-line-extra`; with `--giee/--gdee` writes `<stem>.pc` and adds `%pointcharges "<stem>.pc"`; `--out`; batch: `--glob`, `--out-dir` | With `--glob`, `geom` must be a directory; outputs named after stems                                                            | [src/mbe_tools/input_builder.py](src/mbe_tools/input_builder.py)                                                          |
| CLI  | `mbe template`               | Emit PBS/Slurm scripts (with run-control wrapper)                                                                                              | Shared: `--scheduler pbs/slurm`, `--backend qchem/orca`, `--job-name`, `--walltime`, `--mem-gb`, `--chunk-size`, `--module`, `--command`, `--out`; PBS extras: `--ncpus`, `--queue`, `--project`, `--local-run`, `--control-file`, `--builtin-control`; Slurm extras: `--ncpus`(per task), `--ntasks`, `--partition`, `--project`(account), `--qos`; `--wrapper` | `--wrapper` writes a bash submitter that generates hidden `._*.pbs/.sbatch` then submits; run-control autodetects control files | [src/mbe_tools/hpc_templates.py](src/mbe_tools/hpc_templates.py)                                                          |
| CLI  | `mbe qchem-mbe [ORDER]`      | Post-process Q-Chem MBE outputs into energy, delta, result, wall-time, and CPU CSVs                                                           | `--specify/-s DIR[:n]`, `--exclude/-x DIR`, `--force/-f`, `--root`, `--out-dir`                                                                                                                                                                                                                                                                                  | Bashrc `MBE` compatible; `--force` skips bad systems after integrity checks                                                     | [src/mbe_tools/qchem_commands.py](src/mbe_tools/qchem_commands.py)                                                        |
| CLI  | `mbe qchem-mbe-cbs [ORDER]`  | Post-process Q-Chem MBE-CBS outputs with SCF/correlation tables and optional `CBS.csv`                                                        | Same target options as `mbe qchem-mbe`                                                                                                                                                                                                                                                                                                                           | Adds `Energy_SCF.csv`, `Energy_corr.csv`; applies EE-MBE charge-charge corrections before total/SCF/CBS tables                  | [src/mbe_tools/qchem_commands.py](src/mbe_tools/qchem_commands.py)                                                        |
| CLI  | `mbe energy-to-mbe <csv>`    | Recompute `deltaE.csv` and `Result.csv` from an `Energy.csv` term table                                                                        | `--delta-out`, `--result-out`, `--max-order`, `--force`, `--strict-labels/--no-strict-labels`                                                                                                                                                                                                                                                                     | `--force` skips incomplete columns; strict labels validate term kind versus index count                                          | [src/mbe_tools/qchem_commands.py](src/mbe_tools/qchem_commands.py)                                                        |
| CLI  | `mbe script-library`         | List, print, or write packaged helper scripts for parsing and MBE energy                                                                       | Optional script name; `--out`, `--force`                                                                                                                                                                                                                                                                                                                         | Current helpers include `parse-outs` and `mbe-energy`; written scripts are executable                                            | [src/mbe_tools/script_library.py](src/mbe_tools/script_library.py)                                                        |
| CLI  | `mbe parse <root>`           | Parse Q-Chem/ORCA outputs to JSONL                                                                                                             | `--program auto|qchem|q-chem|orca` (default qchem), `--glob-pattern`, `--jobs/-j`, `--out`, `--summary-out`, `--resume`, `--infer-metadata`, `--cluster-xyz`, `--nosearch`, `--geom-mode first/last`, `--geom-source singleton/any`, `--geom-drop-ghost`, `--geom-max-lines`                                                                                    | Infers method/basis/grid from names/inputs; can embed cluster geometry; parallel parsing preserves output order; optional summary JSON records matched files, fingerprints, and status totals; `--resume` reuses unchanged existing rows by path | [src/mbe_tools/parse.py](src/mbe_tools/parse.py)                                                                          |
| CLI  | `mbe enrich <jsonl>`         | Enrich calc-only JSONL with a cluster geometry record from referenced outputs                                                                 | `--root`, `--program auto|qchem|q-chem|orca`, geometry search flags, `--out`                                                                                                                                                                                                                                                                                     | Writes `<input>.enriched.jsonl` by default; exits cleanly when a cluster record already exists                                   | [src/mbe_tools/parse.py](src/mbe_tools/parse.py)                                                                          |
| CLI  | `mbe analyze <parsed.jsonl>` | Summaries/exports/plots                                                                                                                        | `--to-csv`, `--to-xlsx`, `--plot`, `--scheme simple/strict`, `--max-order`                                                                                                                                                                                                                                                                                       | `strict` uses inclusion–exclusion; `simple` computes ΔE vs mean monomer                                                         | [src/mbe_tools/analysis.py](src/mbe_tools/analysis.py)                                                                    |
| CLI  | `mbe show <jsonl>`           | Quick cluster/CPU/energy view plus strict MBE(k) totals with per-order ΔE                                                                      | `--monomer N` (0-based) prints geometry and participation/CPU; default JSONL selection if path omitted                                                                                                                                                                                                                                                           | Uses default JSONL selection; prints inclusion–exclusion MBE rows                                                               | [src/mbe_tools/jsonl_views.py](src/mbe_tools/jsonl_views.py)                                                              |
| CLI  | `mbe info <jsonl>`           | Coverage + CPU summary                                                                                                                         | Filters: `--program`, `--method`, `--basis`, `--grid`, `--cp`, `--status`; `--scheme`; `--max-order`; `--json`                                                                                                                                                                                                                                                   | Status counts by subset_size                                                                                                    | [src/mbe_tools/jsonl_views.py](src/mbe_tools/jsonl_views.py)                                                              |
| CLI  | `mbe calc <jsonl>`           | CPU totals + MBE energies (simple/strict) and subset interaction ΔE vs monomer sums                                                            | `--scheme simple/strict`, `--to`, `--from`, `--monomer`, `--unit hartree/kcal/kj`, `--interaction i,j[,k]` (0-based, repeatable)                                                                                                                                                                                                                                 | Warns on mixed program/method/basis/grid/cp combos                                                                              | [src/mbe_tools/calc.py](src/mbe_tools/calc.py)                                                                            |
| CLI  | `mbe validate <jsonl>`       | JSONL schema, coverage, and provenance validation                                                                                               | `--json`, `--require-energy`, `--require-provenance`, `--require-schema-version`                                                                                                                                                                                                                                                                                  | Reports subset index/order mismatches, invalid numbers, duplicates, coverage hints, provenance completeness, and schema versions | [src/mbe_tools/validation.py](src/mbe_tools/validation.py)                                                                |
| CLI  | `mbe save <jsonl>`           | Archive JSONL into a run folder with `run.jsonl` and `run.meta.json`                                                                            | `--dest DIR`, `--order`, `--no-include-energy`                                                                                                                                                                                                                                                                                                                   | Uses cluster_id/stamp subfolders                                                                                                | [src/mbe_tools/save.py](src/mbe_tools/save.py)                                                                            |
| CLI  | `mbe compare <dir or glob>`  | Compare JSONL runs by CPU, records, coverage, and MBE energy                                                                                   | `--cluster ID`, `--scheme simple/strict`, `--order K`, `--ref latest/first/PATH`                                                                                                                                                                                                                                                                                 | Accepts JSONL files, directories/globs, and saved-run archive directories; CSV/XLSX exports include source path/archive and saved metadata columns | [src/mbe_tools/compare.py](src/mbe_tools/compare.py)                                                                      |
| CLI  | `mbe doctor`                 | Report installation, optional dependency, config-file, backend-command, and artifact diagnostics                                                | `--config`, `--jsonl`, `--parse-summary`, `--saved-run`, `--report-out`, `--json`, `--strict`                                                                                                                                                                                                                                                                    | Strict mode exits nonzero for missing/malformed explicit config, unusable configured backend commands, invalid JSONL, stale parse summaries, or saved-run archive drift | [src/mbe_tools/diagnostics.py](src/mbe_tools/diagnostics.py)                                                              |
| CLI  | `mbe where`                  | Print default data, config, cache, state, and runs archive paths                                                                               | _none_                                                                                                                                                                                                                                                                                                                                                           | Shows the default archive root used by `mbe save`                                                                               | [src/mbe_tools/paths.py](src/mbe_tools/paths.py)                                                                          |
| CLI  | `mbe set-library <dir>`      | Persist the default archive directory used by `mbe save`                                                                                       | Directory path                                                                                                                                                                                                                                                                                                                                                    | `--dest` and `MBE_SAVE_DEST` can still override the saved default                                                               | [src/mbe_tools/paths.py](src/mbe_tools/paths.py)                                                                          |
| CLI  | `mbe list-runs`              | List saved-run archives from the configured library                                                                                            | `--dest`, `--cluster`, `--limit`, `--to-csv`, `--to-xlsx`, `--output-format pipe/tsv`, `--json`                                                                                                                                                                                                                                                                   | Reads `run.meta.json` beside `run.jsonl`; surfaces metadata health; supports text, CSV/XLSX, or machine-readable inventory output | [src/mbe_tools/paths.py](src/mbe_tools/paths.py)                                                                          |
| API  | Cluster                      | `read_xyz`, `write_xyz`, `fragment_by_water_heuristic`, `fragment_by_connectivity`, `sample_fragments`, `spatial_sample_fragments`             | See function args for cutoffs, scaling, seeds                                                                                                                                                                                                                                                                                                                    | Supports ion retention and special-fragment preference                                                                          | [src/mbe_tools/cluster.py](src/mbe_tools/cluster.py)                                                                      |
| API  | MBE generation               | `MBEParams`, `generate_subsets_xyz`                                                                                                            | Args: `max_order`, `orders`, `cp_correction`, `backend`, `scheme`                                                                                                                                                                                                                                                                                                | Yields `(job_id, subset_indices, geom_text)` for each subset                                                                    | [src/mbe_tools/mbe.py](src/mbe_tools/mbe.py)                                                                              |
| API  | Input builders               | `render_qchem_input`, `render_orca_input`, `build_input_from_geom`, `build_input_artifacts_from_geom`                                           | Method/basis required; optional thresh/tole/scf/grid/extra lines                                                                                                                                                                                                                                                                                                 | Used by CLI `build-input`; accepts .geom path and can return ORCA `.pc` sidecar text                                             | [src/mbe_tools/input_builder.py](src/mbe_tools/input_builder.py)                                                          |
| API  | Templates                    | `render_pbs_qchem`, `render_slurm_orca`                                                                                                        | Scheduler resources + chunking + run-control wrapper                                                                                                                                                                                                                                                                                                             | `wrapper` flag mirrors CLI behavior                                                                                             | [src/mbe_tools/hpc_templates.py](src/mbe_tools/hpc_templates.py)                                                          |
| API  | Parsing                      | `detect_program`, `parse_file`, `parse_files`, `infer_metadata_from_path`, `glob_paths`                                                        | Program auto-detect; metadata inference from names/inputs; optional deterministic parallel parsing via `parse_files(..., jobs=N)`                                                                                                                                                                                                                                 | Companion inputs help fill method/basis/grid                                                                                    | [src/mbe_tools/parsers/io.py](src/mbe_tools/parsers/io.py)                                                                |
| API  | Analysis                     | `read_jsonl`, `to_dataframe`, `summarize_by_order`, `compute_delta_energy`, `strict_mbe_orders`, `assemble_mbe_energy`, `order_totals_as_rows` | Convenience helpers for MBE tables and plots                                                                                                                                                                                                                                                                                                                     | `strict_mbe_orders` builds inclusion–exclusion rows                                                                             | [src/mbe_tools/analysis.py](src/mbe_tools/analysis.py)                                                                    |

### CLI details with examples

| Command                     | Option(s)                                                                                                                                       | Meaning                                                                                                     | Example                                                                           |
| --------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------- |
| `mbe fragment <xyz>`        | `--mode random/spatial`, `--n`, `--require-ion`                                                                                                 | Fragment and sample XYZ                                                                                     | `mbe fragment water3.xyz --mode spatial --n 2`                                    |
| `mbe gen <xyz>`             | `--max-order`, `--order`, `--cp/--no-cp`                                                                                                        | Generate subset geometries                                                                                  | `mbe gen big.xyz --max-order 3 --out-dir geoms`                                   |
| `mbe build-input <geom>`    | `--backend qchem/orca`, `--method`, `--basis`, `--charge`, `--multiplicity`, `--no-cp`; Q-Chem extras `--sym-ignore/--no-sym-ignore`, `--giee elem=charge` (repeatable) or `--gdee file`; ORCA `--giee/--gdee` writes `<stem>.pc` + `%pointcharges` | Render Q-Chem/ORCA input from geom                                                                          | `mbe build-input frag.geom --backend orca --no-cp --giee O=0.2 --out frag.inp` |
| `mbe template`              | `--scheduler pbs/slurm`, `--backend`, `--wrapper`                                                                                               | Emit PBS/Slurm script (optional wrapper submitter)                                                          | `mbe template --scheduler pbs --backend qchem --wrapper`                          |
| `mbe parse <root>`          | `--program auto|qchem|q-chem|orca`, `--glob-pattern`, `--jobs/-j`, geometry search flags                                                        | Parse outputs to JSONL (can embed cluster geometry)                                                         | `mbe parse ./Output --glob "*.out" --jobs 4 --geom-source any`                    |
| `mbe qchem-mbe [ORDER]`     | `--specify/-s DIR[:n]` (repeatable), `--exclude/-x DIR` (repeatable), `--force/-f`, `--root`, `--out-dir`                                      | Post-process Q-Chem MBE outputs into `Energy.csv`, `deltaE.csv`, `Result.csv`, `WallTime.csv`, and `CPUTime.csv` | `mbe qchem-mbe 3 --specify Water10:3 --exclude Water15`                           |
| `mbe qchem-mbe-cbs [ORDER]` | same options as `mbe qchem-mbe`                                                                                                                  | CBS-style post-process with `Energy_SCF.csv`, `Energy_corr.csv`, and optional `CBS.csv` extrapolated with Neese/Valeev coefficients | `mbe qchem-mbe-cbs 3 --force`                                                     |
| `mbe energy-to-mbe <csv>`   | `--delta-out`, `--result-out`, `--max-order`, `--force`, `--strict-labels/--no-strict-labels`                                                  | Recompute `deltaE.csv` and `Result.csv` from `Energy.csv`                                                    | `mbe energy-to-mbe Energy.csv --delta-out deltaE.csv --result-out Result.csv`     |
| `mbe analyze <jsonl>`       | `--scheme simple/strict`, `--to-csv`, `--plot`                                                                                                  | Summaries, exports, plots                                                                                   | `mbe analyze parsed.jsonl --scheme strict`                                        |
| `mbe show <jsonl>`          | `--monomer N` (0-based) plus default selection if path omitted                                                                                  | Quick cluster/CPU/energy view plus strict MBE(k) totals with per-order ΔE                                   | `mbe show parsed.jsonl --monomer 0`                                               |
| `mbe info <jsonl>`          | Filters: `--program`, `--method`, `--basis`, `--grid`, `--cp`, `--status`; `--scheme`; `--max-order`; `--json`                                  | Coverage + CPU + optional MBE summary                                                                       | `mbe info --program qchem --json`                                                 |
| `mbe validate <jsonl>`      | `--json`, `--require-energy`, `--require-provenance`, `--require-schema-version`                                                                | Validate JSONL consistency, coverage hints, provenance completeness, and schema-version strictness           | `mbe validate parsed.jsonl --json`                                                |
| `mbe calc <jsonl>`          | `--scheme simple/strict`; `--to`; `--from`; `--monomer`; `--unit hartree/kcal/kj`; `--interaction i,j[,k]`                                      | CPU totals + MBE energies; interaction ΔE for specified subset; monomer energy reporting                    | `mbe calc parsed.jsonl --scheme strict --unit kcal --interaction 0,1 --monomer 0` |
| `mbe script-library`        | optional script name; `--out`, `--force`                                                                                                        | List/write helper scripts for simple parsing and MBE-energy workflows                                       | `mbe script-library parse-outs --out parse_outs.py`                               |
| `mbe doctor`                | `--config`, `--jsonl`, `--parse-summary`, `--saved-run`, `--report-out`, `--json`, `--strict`                                                   | Report installation, optional dependency, config-file, backend-command, and artifact diagnostics; `--strict` fails when an explicit config is missing/malformed, configured backend commands are unusable, JSONL validation has errors, parse-summary counts are stale, or saved-run archives drift | `mbe doctor --jsonl parsed.jsonl --parse-summary parse.summary.json --saved-run runs/water2/latest --report-out doctor.json --json` |
| `mbe version`               | _none_                                                                                                                                          | Print package, Python, and JSONL schema versions                                                           | `mbe version`                                                                     |
| `mbe save <jsonl>`          | `--dest DIR`, `--order`, `--no-include-energy`                                                                                                  | Archive JSONL to `<dest>/<cluster>/<stamp>__<method>__<basis>__<grid>__<cp>/run.jsonl` with versioned `run.meta.json` | `mbe save parsed.jsonl --dest runs/`                                              |
| `mbe set-library <dir>`     | _none_                                                                                                                                          | Persist an existing default archive directory used by `mbe save`                                           | `mkdir -p ~/mbe_runs && mbe set-library ~/mbe_runs`                               |
| `mbe list-runs`             | `--dest`, `--cluster`, `--limit`, `--to-csv`, `--to-xlsx`, `--output-format`, `--json`                                                          | Inventory saved-run archives with metadata from `run.meta.json`                                            | `mbe list-runs --dest runs --cluster water20 --to-csv inventory.csv`              |
| `mbe compare <dir or glob>` | `--cluster`, `--scheme simple/strict`, `--order K`, `--ref latest/first/PATH`                                                                   | Compare runs; accepts saved-run archive dirs; shows cpu_ok, counts, combo labels, and ΔCPU/ΔE vs reference  | `mbe compare runs/water20/* --cluster water20 --ref latest`                       |

`mbe doctor --json` includes both effective settings and `settings_sources`
(`env`, `user`, `project`, or `explicit`) so precedence issues can be debugged
without guessing which file or environment variable won. Add `--jsonl` and
`--parse-summary` to inspect a parsed JSONL artifact plus the matching
`mbe parse --summary-out` file; strict mode also checks that the summary
`record_count` still matches the JSONL record count and that the summary's
matched output files still exist with the recorded size/mtime fingerprints.
The parse summary schema is mirrored in
[docs/parse_summary_contract.json](docs/parse_summary_contract.json).
Add `--saved-run <archive-dir>` to verify an `mbe save` archive: `doctor`
checks `run.jsonl`, `run.meta.json`, metadata schema/type, `record_count`, and
the archived JSONL size/SHA-256 fingerprint recorded in metadata.
Use `--report-out doctor.json` to save the complete machine-readable payload
while keeping stdout in text or JSON mode. The persisted payload includes
`schema_version: 1` and `report_type: "doctor"`; its current contract is
mirrored in [docs/doctor_report_contract.json](docs/doctor_report_contract.json).
When checking a saved-run archive, the doctor payload also includes the shared
saved-run metadata health fields (`metadata_ok`, `metadata_issue_codes`, and
`metadata_issues`) for scriptable archive audits; the doctor contract lists the
fields emitted for config-file rows, optional-dependency rows, backend rows,
backend command status, base settings keys, `artifacts.jsonl`,
`artifacts.parse_summary`, and `artifacts.saved_run`. Custom settings loaded
from TOML remain preserved after the base settings keys.

`mbe save` writes `run.meta.json` with `schema_version: 1` and
`metadata_type: "saved_run"` so archived results can be inspected by scripts
without guessing the metadata shape. The metadata also records `record_count`
and a `source_fingerprint` with source JSONL size, mtime, and SHA-256 so the
archived `run.jsonl` can be audited against its source. The current saved-run metadata contract is
mirrored in [docs/save_metadata_contract.json](docs/save_metadata_contract.json).

`mbe list-runs --json` emits a versioned saved-run inventory payload with
`schema_version: 1` and `report_type: "saved_run_inventory"`, so downstream
scripts can depend on stable top-level and row fields. Inventory rows include
`metadata_ok`, `metadata_schema_version`, `metadata_type`, and
`metadata_issues` so incomplete or unreadable `run.meta.json` files are visible
in text output, JSON, and exported audit tables. The current inventory contract is mirrored in
[docs/saved_run_inventory_contract.json](docs/saved_run_inventory_contract.json).

### CLI option notes

- `mbe fragment <xyz>`: `--mode random|spatial` (sampling strategy); `--n` (samples); `--require-ion` (retain ions); spatial extras `--prefer-special`, `--k-neighbors`, `--start-index`; `--oh-cutoff` (bond cutoff); `--out-xyz` (write sampled XYZ). 
- `mbe gen <xyz>`: `--max-order` or repeatable `--order/--orders` (subset orders); `--cp/--no-cp` (counterpoise ghosts); `--scheme` (naming scheme); `--backend [qchem|orca]` (job_id style); `--oh-cutoff` (connectivity for water heuristic); `--out-dir` (geom output dir). 
- `mbe build-input <geom>`: required `--backend`, `--method`, `--basis`; shared `--charge`, `--multiplicity`, `--no-cp`; Q-Chem: `--thresh`, `--tole`, `--scf-convergence`, `--xc-grid`, `--rem-extra`, `--sym-ignore/--no-sym-ignore`, embedding `--giee elem=charge` (repeatable; bare value applies to O/H) or `--gdee file` for `$external_charges`; ORCA: `--grid`, `--scf-convergence`, `--keyword-line-extra`; with `--giee/--gdee` writes `<stem>.pc` and inserts `%pointcharges "<stem>.pc"` after the first line in `.inp`; batch with `--glob` + `--out-dir`.
- `mbe template`: `--scheduler [pbs|slurm]`, `--backend [qchem|orca]`, `--job-name`, `--walltime`, `--mem-gb`, `--chunk-size`, `--module`, `--command`, `--out`; PBS extras `--ncpus`, `--queue`, `--project`, `--local-run`, `--control-file`, `--builtin-control`; Slurm extras `--ncpus`(per task), `--ntasks`, `--partition`, `--project`(account), `--qos`; `--wrapper` emits a submitter script. 
- `mbe parse <root>`: `--program auto|qchem|q-chem|orca` (default qchem); `--glob-pattern`; `--jobs/-j` (parallel output parsing; JSONL order follows the sorted path list); `--out`; `--summary-out`; `--resume`; `--infer-metadata`; geometry search `--cluster-xyz`, `--geom-mode first|last`, `--geom-source singleton|any`, `--geom-drop-ghost`, `--geom-max-lines`, `--nosearch`. If `--glob "*.out"` has no direct matches under `root`, the parser also checks nested run directories for compatibility with layouts such as `W25/RI-avqz-TIP3P/*.out`. When records fail, the parse report summarizes ok/failed totals and grouped `error_reason` values while each failed record remains in JSONL. `--summary-out run.json` writes a compact JSON run summary with matched files, file fingerprints, worker count, output path, geometry status, and status totals. `--resume` reuses existing rows from `--out` by exact matched `path`; when a previous summary with fingerprints is present, changed files are parsed again.
- `mbe qchem-mbe [ORDER]`: batch Q-Chem post-processing; `--specify/-s DIR[:n]` and `--exclude/-x DIR` are repeatable, `ROOT` is accepted in `--specify`; `--force` continues after Step0 failures; writes `Result.csv`, `Energy.csv`, `deltaE.csv`, `WallTime.csv`, `CPUTime.csv`.
- `mbe qchem-mbe-cbs [ORDER]`: same as `qchem-mbe` but uses final/CBS-style energy parsing and additionally writes `Energy_SCF.csv`, `Energy_corr.csv`, and optional `CBS.csv`. CBS extrapolation detects `aug-cc-pVDZ/TZ/QZ` or `aVDZ/TZ/QZ` with coefficients alpha/beta `(2,3)=(4.30,2.51)`, `(3,4)=(5.79,3.05)`, and `cc-pVDZ/TZ/QZ` or `VDZ/TZ/QZ` with `(2,3)=(4.42,2.46)`, `(3,4)=(5.46,3.05)`. EE-MBE charge-charge corrections are applied before writing `Energy.csv`, `Energy_SCF.csv`, `Energy_corr.csv`, or `CBS.csv`.
- `mbe script-library [SCRIPT]`: omit `SCRIPT` to list helpers; `parse-outs` writes a key-info CSV/optional JSONL from `.out`; `mbe-energy` parses `.out` and prints strict MBE energies. Use `--out FILE.py` to write an executable script and `--force` to overwrite.
- `mbe energy-to-mbe <Energy.csv>`: read an existing `Energy.csv` term table and regenerate `deltaE.csv` + `Result.csv`; `--max-order` trims order, `--force` skips incomplete columns, `--strict-labels` validates term-kind vs index count.
- `mbe where`: show default data/config/cache/state/runs paths.
- `mbe analyze <jsonl>`: `--scheme simple|strict`; `--to-csv`, `--to-xlsx`, `--plot`; `--max-order` (trim orders).
- `mbe show <jsonl>`: optional path (defaults apply); `--monomer N` (0-based) prints geometry, CPU share, participation; output also shows CPU totals, per-order energy stats, strict MBE(k) totals with per-order ΔE.
- `mbe info <jsonl>`: filters `--program/method/basis/grid/cp/status`; `--scheme`; `--max-order`; `--json` for JSON-only machine output; reports coverage by subset_size plus CPU.
- `mbe validate <jsonl>`: validates calc rows and optional cluster metadata; reports malformed/missing `subset_indices`, `subset_size` mismatches, duplicate OK subsets, non-numeric energy/CPU values, provenance completeness, and coverage hints when `n_monomers` is known.
- `mbe calc <jsonl>`: `--scheme simple|strict` (simple: ΔE vs mean monomer; strict: inclusion–exclusion); `--to K` (upper order); `--from K0` (lower bound for ΔE K0→K); `--monomer N` (report monomer energy); `--unit hartree|kcal|kj`; `--interaction i,j[,k]` (0-based, repeatable) gives subset interaction E − ΣE(monomers).
- `mbe save <jsonl>`: `--dest DIR` (override default library/env); `--order` (filter subsets); `--no-include-energy` (skip energies); `run.meta.json` includes `schema_version: 1`, `metadata_type: "saved_run"`, `record_count`, and source size/mtime/SHA-256 fingerprint fields; verify archives later with `mbe doctor --saved-run <archive-dir>`.
- `mbe set-library <dir>`: persist an existing default archive root for save/compare.
- `mbe list-runs`: list saved-run archives from the configured library; `--dest` overrides the library root, `--cluster` filters by `cluster_id`, `--limit` keeps the newest N, `--to-csv`/`--to-xlsx` export an inventory table, `--output-format pipe|tsv` controls text output, and `--json` emits metadata rows for scripts. Inventory rows include `metadata_ok` and `metadata_issues` so malformed or incomplete archive metadata is not hidden as an empty summary.
- `mbe compare <dir|glob>`: `--cluster ID` filter; `--scheme simple|strict`; `--order K`; `--ref latest|first|PATH` sets reference; accepts saved-run archive directories and `--ref` may point at an archive directory; outputs ΔCPU/ΔE vs ref. CSV/XLSX exports include `path`, `saved_run_archive`, `saved_at_utc`, and `saved_source_sha256` columns for auditability.

## Run-control (templates)

- Control file discovery: prefer `<input>.mbe.control.toml`, else `mbe.control.toml`, else run-control disabled.
- Attempt logging: write `job._try.out`; on failure rename to `job.attemptN.out`; on success rename to `job.out`. `confirm.log_path` can override temp log location.
- Confirmation: `confirm.regex_any` (must match) and `confirm.regex_none` (must not match) on the temp log; success also requires exit code 0.
- Retry: `retry.enabled`, `max_attempts`, `sleep_seconds`, `cleanup_globs`, `write_failed_last` (copy last attempt to `failed_last_path`).
- Delete safeguards: `delete.enabled` + `allow_delete_outputs=true` to delete outputs; inputs removed only if matched by `delete_inputs_globs`.
- State: `.mbe_state.json` records status, attempts, matched regex, log paths; `skip_if_done` skips reruns when marked done.

## Subset naming

- Default (`mbe gen`): `{backend}_k{order}_{i1}.{i2}...` with **1-based** fragment indices (no hash suffix), e.g., `qchem_k2_1.3.geom`.
- Legacy (still parsed): hashed suffixes like `{backend}_k{order}_{i1}.{i2}..._{hash}` or `{backend}_k{order}_f{i1}-{i2}-{i3}_{cp|nocp}_{hash}` remain supported.
- Compatibility-only (accepted for parsing/analysis, not recommended for new files): names like `h2o.2.3.7.11.xyz.modified.out`; indices are interpreted as **1-based** in the filename and converted to 0-based `subset_indices` in JSONL.
JSON always exposes `subset_indices` as 0-based.

## JSONL schema (parse output)

The authoritative data-contract notes live in [docs/JSONL_SCHEMA.md](docs/JSONL_SCHEMA.md). Current contract: `calc-v1+cluster-v2`.

```json
{
  "record_type": "calc",
  "schema_version": 1,
  "job_id": "qchem_k2_1.3",
  "program": "qchem",
  "program_detected": "qchem",
  "status": "ok",
  "error_reason": null,
  "path": ".../job.out",
  "energy_hartree": -458.7018184,
  "cpu_seconds": 1234.5,
  "wall_seconds": 1234.5,
  "method": "wB97M-V",
  "basis": "def2-ma-QZVPP",
  "grid": "SG-2",
  "subset_size": 2,
  "subset_indices": [0, 2],
  "cp_correction": true,
  "extra": {}
}
```

Shared schema helpers in [src/mbe_tools/schema.py](src/mbe_tools/schema.py) normalize JSONL rows and provide common subset-index and summary utilities such as `parse_subset_indices_token`, `subset_records`, `records_containing_monomer`, `monomer_participation_summary`, `monomer_energy_map`, `subset_interaction_energy`, `combo_counts`, `combo_labels`, `combo_archive_slug_parts`, `mixed_combo_labels`, `energy_by_order`, `energy_stats_by_order`, `reference_energy_mean`, `records_with_reference_energy_delta`, `summarize_records_by_order`, `calc_record_summary`, `filter_records`, `cpu_totals`, `cpu_seconds_total`, `status_counts`, `coverage_by_order`, and `provenance_summary`, which are reused by CLI show/info/calc/summary/archive/validation commands.

## API highlights

- Cluster ([src/mbe_tools/cluster.py](src/mbe_tools/cluster.py)): `read_xyz`, `write_xyz`, `fragment_by_water_heuristic`, `fragment_by_connectivity`, `sample_fragments`, `spatial_sample_fragments`. XYZ reads/writes share the common file IO layer; `write_xyz` creates parent directories.
- MBE generation ([src/mbe_tools/mbe.py](src/mbe_tools/mbe.py)): `MBEParams`, `generate_subsets_xyz`, `qchem_molecule_block`, `orca_xyz_block`.
- Generation command workflows ([src/mbe_tools/generation.py](src/mbe_tools/generation.py)): implementation helpers for `mbe fragment`, `mbe gen`, public `mbe gen-from-monomer`, and hidden legacy `mbe gen_from_monomer`, including shared monomer `.geom` loading and subset-geometry file writing.
- Input builders ([src/mbe_tools/input_builder.py](src/mbe_tools/input_builder.py)): `render_qchem_input`, `render_orca_input`, `build_input_from_geom`, `build_input_artifacts_from_geom`, `parse_giee_charge_specs`, `normalize_giee_charges`, `parse_external_charges_text`; the artifact builder centralizes ORCA `.pc` sidecar text with input rendering.
- Build-input command workflow ([src/mbe_tools/input_builder.py](src/mbe_tools/input_builder.py)): command option normalization for `mbe build-input`, including GIEE/GDEE parsing, no-CP geometry filtering, ORCA `.pc` sidecars, and single/batch artifact writing.
- Geometry blocks ([src/mbe_tools/geom_blocks.py](src/mbe_tools/geom_blocks.py)): shared atom-line parsing, ghost detection, ghost stripping, ORCA ghost normalization, and monomer `.geom` parser rules.
- Output geometry ([src/mbe_tools/geometry.py](src/mbe_tools/geometry.py)): `extract_geometry_from_out_head`, `geometries_from_calc_records`, `normalize_geom_source`, `cluster_id_from_root`, `cluster_record_from_monomers`, `cluster_record_from_geometries`.
- HPC templates ([src/mbe_tools/hpc_templates.py](src/mbe_tools/hpc_templates.py)): `render_pbs_qchem`, `render_slurm_orca` (both embed run-control wrapper).
- Template command workflow ([src/mbe_tools/hpc_templates.py](src/mbe_tools/hpc_templates.py)): command implementation helper for `mbe template`, including scheduler/backend option normalization and script file writing.
- Parsing ([src/mbe_tools/parsers/io.py](src/mbe_tools/parsers/io.py)): `detect_program`, `parse_file`, `parse_files`, `infer_metadata_from_path`, `glob_paths` (file discovery delegates to shared path/glob resolution).
- Parse reporting ([src/mbe_tools/parse_reporting.py](src/mbe_tools/parse_reporting.py)): `parse_failure_summary`, `parse_failure_report_lines`, and key-info row helpers for parse output summaries and helper scripts.
- Parse command workflows ([src/mbe_tools/parse.py](src/mbe_tools/parse.py)): command implementation helpers for `mbe parse` and `mbe enrich`, including cluster-record emission, optional parse run summary JSON writing, and enriched JSONL writing.
- Analysis ([src/mbe_tools/analysis.py](src/mbe_tools/analysis.py)): `read_jsonl`, `summarize_by_order`, `compute_delta_energy`, `strict_mbe_orders`.
- MBE math ([src/mbe_tools/mbe_math.py](src/mbe_tools/mbe_math.py)): `build_energy_map`, `compute_contributions`, `compute_delta`, `compute_mbe`, `assemble_mbe_energy`.
- MBE terms ([src/mbe_tools/mbe_terms.py](src/mbe_tools/mbe_terms.py)): shared labels, term parsing, and CSV sort keys for MBE Energy/delta tables.
- MBE tables ([src/mbe_tools/mbe_tables.py](src/mbe_tools/mbe_tables.py)): shared energy term labels, delta table labels, per-order delta sums, and cumulative MBE result values.
- MBE reporting ([src/mbe_tools/mbe_reporting.py](src/mbe_tools/mbe_reporting.py)): shared strict-MBE row assembly, selected-order summaries, missing-subset formatting, and text report rendering for calc/show/compare/analyze/save paths and the `mbe-energy` helper script.
- CSV tables ([src/mbe_tools/csv_tables.py](src/mbe_tools/csv_tables.py)): shared dict-row/pivot/result/CBS CSV numeric formatting, blank-cell handling, and table writers backed by common file IO setup.
- Tabular exports ([src/mbe_tools/tabular_exports.py](src/mbe_tools/tabular_exports.py)): shared pandas-backed DataFrame creation and CSV/XLSX export helpers for analyze/compare command paths.
- Q-Chem energy ([src/mbe_tools/qchem_energy.py](src/mbe_tools/qchem_energy.py)): shared total/SCF/CBS/pre-convergence energy extraction and charge-charge correction for parsers and Q-Chem batch commands.
- Q-Chem command workflows ([src/mbe_tools/qchem_commands.py](src/mbe_tools/qchem_commands.py)): command implementation helpers for `qchem-mbe`, `qchem-mbe-cbs`, `cbs-exploration`, hidden legacy global `--cbs-exploration`, and `energy-to-mbe`.
- Script-library command workflow ([src/mbe_tools/script_library.py](src/mbe_tools/script_library.py)): command implementation helper for listing, printing, and writing packaged helper scripts.
- Chemistry metadata ([src/mbe_tools/chem_metadata.py](src/mbe_tools/chem_metadata.py)): shared method/basis inference, Q-Chem/ORCA input metadata parsing, CBS basis detection, and CBS cardinal label/cardinal helpers.
- Backends ([src/mbe_tools/backends/base.py](src/mbe_tools/backends/base.py)): capability registry for geometry blocks, ghost atoms, point-charge modes, and input sections.
- JSONL IO ([src/mbe_tools/jsonl_io.py](src/mbe_tools/jsonl_io.py)): streaming JSONL readers with line-numbered parse errors plus shared JSONL serialization/writing helpers backed by common reader/writer setup.
- JSONL selection ([src/mbe_tools/jsonl_selector.py](src/mbe_tools/jsonl_selector.py)): shared default JSONL selection and normalized cluster/calc loading for commands with optional JSONL paths.
- JSONL reporting ([src/mbe_tools/jsonl_reporting.py](src/mbe_tools/jsonl_reporting.py)): shared cluster/CPU/combination/coverage/energy-stat report lines, JSON-safe combo/coverage/energy-stat payload shapes, compact compare summaries, and compare table rendering for JSONL commands.
- JSONL view command workflows ([src/mbe_tools/jsonl_views.py](src/mbe_tools/jsonl_views.py)): command implementation helpers for show/info contexts and text reports.
- Diagnostics command workflow ([src/mbe_tools/diagnostics.py](src/mbe_tools/diagnostics.py)): implementation helper for `mbe doctor`, including dependency, config-file, backend-command, backend-capability, JSONL artifact, parse-summary, and saved-run archive reporting.
- Saved-run archive helpers ([src/mbe_tools/saved_runs.py](src/mbe_tools/saved_runs.py)): shared archive filenames, metadata constants/loading, metadata-health issue fields, archive detection, `run.jsonl` resolution, and source fingerprinting for save/compare/doctor/list-runs paths.
- CLI options ([src/mbe_tools/cli_options.py](src/mbe_tools/cli_options.py)): shared normalization for energy aggregation schemes and energy units.
- Calc/analysis/compare/save/validate command workflows: [calc.py](src/mbe_tools/calc.py), [analysis.py](src/mbe_tools/analysis.py), [compare.py](src/mbe_tools/compare.py), [save.py](src/mbe_tools/save.py), and [validation.py](src/mbe_tools/validation.py) own the reusable behavior behind those CLI commands.
- Job naming ([src/mbe_tools/job_naming.py](src/mbe_tools/job_naming.py)): shared `kN`, subset-index, and compatibility filename parsing with explicit raw/0-based index-base conversion.
- Paths ([src/mbe_tools/paths.py](src/mbe_tools/paths.py)): default data/config/cache/state/runs paths, user path expansion/resolution, path/glob resolution for compare/parse discovery, and saved-run library resolution.
- Path command workflows ([src/mbe_tools/paths.py](src/mbe_tools/paths.py)): command implementation helpers for `mbe where`, `mbe set-library`, and `mbe list-runs`.
- File IO ([src/mbe_tools/fileio.py](src/mbe_tools/fileio.py)): required and tolerant full/head text-file reads plus UTF-8 reader/writer handles shared by parsers, input rendering, cluster XYZ IO, CSV/JSONL inputs and outputs, geometry extraction, batch workflows, and command output paths.

## Notebook

See `notebooks/sample_walkthrough.ipynb` for an end-to-end demo: build inputs, generate templates, and assemble MBE(k) energies from synthetic data.

## Reproducible Examples

The `examples/synthetic/` dataset uses hand-written Q-Chem-like and ORCA-like outputs, so it can be checked without proprietary quantum-chemistry software:

```bash
python3 examples/synthetic/regenerate.py --check
python3 examples/synthetic/check_cli_workflow.py
```

Its contract is recorded in [examples/synthetic/manifest.json](examples/synthetic/manifest.json), and `--check` validates the manifest, including regression coverage claims, before comparing regenerated outputs. The CLI workflow smoke proves the same fixtures work through the documented `parse`, `validate`, `info`, `calc`, and `analyze` commands.

For the full local health check used by CI, run:

```bash
python3 scripts/check_project.py
```

The project gate contract is recorded in
[docs/project_gate_contract.json](docs/project_gate_contract.json), including
the standard sub-check order, skip options, release-tag handling, and final
`[summary]` output.

## Project Docs

- [Project overview](PROJECT_OVERVIEW.md)
- [Professionalization roadmap](docs/ROADMAP.md)
- [Project contract guide](docs/CONTRACTS.md)
- [Testing guide](docs/TESTING.md)
- [Machine-readable contract index](docs/contract_index.json)
- [Machine-readable docs link contract](docs/docs_link_contract.json)
- [Machine-readable CLI command contract](docs/cli_command_contract.json)
- [JSONL data contract](docs/JSONL_SCHEMA.md)
- [Machine-readable JSONL contract](docs/schema_contract.json)
- [Parser correctness matrix](docs/PARSER_MATRIX.md)
- [Machine-readable parser capabilities](docs/parser_capabilities.json)
- [Machine-readable parse summary contract](docs/parse_summary_contract.json)
- [Backend contract](docs/BACKENDS.md)
- [Machine-readable backend capabilities](docs/backend_capabilities.json)
- [Release checklist](docs/RELEASE.md)
- [Machine-readable release contract](docs/release_contract.json)
- [Machine-readable project gate contract](docs/project_gate_contract.json)
- [Machine-readable doctor report contract](docs/doctor_report_contract.json)
- [Machine-readable saved-run metadata contract](docs/save_metadata_contract.json)
- [Machine-readable saved-run inventory contract](docs/saved_run_inventory_contract.json)
- [Machine-readable test taxonomy contract](docs/test_taxonomy_contract.json)
- [Changelog](CHANGELOG.md)
- [Contributing guide](CONTRIBUTING.md)

## Contributing and Contact

Contributions are welcome—feel free to open issues or send pull requests. For questions or collaboration, reach out to Jiarui Wang at Jiarui.Wang4@unsw.edu.au.

## License

MIT
