# === explanation/apps-and-manifests.md ===

# Apps and manifests

A PlayMolecule "app" is a container plus a JSON manifest. The container holds the science (the actual `proteinprepare` Python code, for example). The manifest declares everything PlayMolecule needs to expose that container as a typed Python function: parameters, defaults, expected outputs, resource requirements, and built-in tests.

This page explains what's in a manifest and how PlayMolecule turns it into the Python surface you import.

## Anatomy of a manifest

A manifest is a JSON document with these top-level keys. Annotated with `…` for omitted parts (not real JSON):

```text
{
  "container_config": { "name": "ProteinPrepare", "version": "1", "condaenvs": [...] },
  "meta_keywords": ["preparation", "protein", ...],
  "citations": [...],
  "files": { "tests/3ptb.pdb": "/app/files/tests/3ptb.pdb", ... },
  "functions": [
    {
      "function": "proteinprepare.apps.proteinprepare.app.main",
      "env": "base",
      "resources": { "ncpu": 1, "ngpu": 0 },
      "outputs": { "output.pdb": "...", "details.csv": "...", "pka_plot.png": "..." },
      "params": [
        { "name": "outdir", "type": "Path", "mandatory": true, ... },
        { "name": "pdbid", "type": "str", "mandatory": false, ... },
        { "name": "pH", "type": "float", "value": 7.2, ... },
        ...
      ],
      "tests": {
        "simple": {
          "description": "Prepare 3PTB structure from RCSB",
          "arguments": { "pdbid": "3ptb" },
          "expected_outputs": ["output.pdb", "details.csv", "pka_plot.png"]
        },
        ...
      },
      "examples": ["proteinprepare(outdir='./test', pdbid='3ptb').run()"],
      "description": "ProteinPrepare prepares proteins (and nucleic acids)"
    }
  ]
}
```

Each entry in `functions` becomes one callable on the app module.

## How a manifest becomes a Python function

When a manifest is loaded, PlayMolecule assembles a Python callable for each entry in `functions`, plus a set of module-level attributes on the app's version submodule:

**On the dynamic function itself:**

1. **Signature** — `params` is converted into an `inspect.Signature`. Each param's `type`, `nargs`, `mandatory`, and default `value` map onto `inspect.Parameter` attributes. The function then *binds* incoming kwargs against that signature on every call, which is how PlayMolecule raises a typed `TypeError` when you mistype `pdbi` for `pdbid`.
2. **Docstring** — `description`, `params`, `outputs`, and `examples` are formatted into a NumPy-style docstring so `help(app)` and `app?` work without any extra wiring.
3. **Tests** — each entry under `tests` becomes a callable test attribute accessible as `app.tests.<name>.run()`.
4. **Manifest metadata** — the per-function manifest entry is attached as `app.__manifest__`.

**On the version submodule (shared by every function in that version):**

5. **Artifacts / files** — entries under `artifacts` (or the older synonym `datasets`) and the `files` block become attributes on the submodule itself: `someapp.v1.artifacts.<NAME>`, `someapp.v1.files`. The full app manifest is also attached at the submodule level as `someapp.v1.__manifest__`.

The end result: at the version submodule `playmolecule.apps.proteinprepare.v1` you get a callable `proteinprepare` (the function), plus `artifacts`, `datasets`, `files`, and `__manifest__` — everything derived from one JSON file.

## Versions

A given app can ship multiple manifests, one per version. They appear as parallel submodules:

```text
playmolecule.apps.proteinprepare           # alias for latest
playmolecule.apps.proteinprepare.v1        # explicit
playmolecule.apps.proteinprepare.v2        # explicit
```

The unqualified symbol is set at import time by natural-sort over the version strings (`v10` sorts after `v9`).

## Multi-function apps

`functions` is a list. A single manifest can expose several entry points — for example, an app that does both "prepare" and "validate". Each one shows up as its own attribute:

```python
from playmolecule.apps import someapp

someapp.prepare(outdir="out", ...)
someapp.validate(outdir="out", ...)
```

A function literally named `main` is exposed under the app name instead, so you can write `someapp(...)` rather than `someapp.main(...)`. That's why `proteinprepare(...)` works — the manifest's actual function is named `main`.

## Where to put a manifest

- **Docker registry** — embedded as an image label.
- **HTTP backend** — served by the backend's `/apps/manifests` endpoint.
- **Local registry** *(developer use)* — `<root>/apps/<appname>/<version>/<manifest>.json` (where `<root>` is the path after `local:` in `PM_REGISTRIES`), alongside a `run.sh` and any payload files. Useful when you're authoring your own app and want to iterate on the manifest without publishing.

The three discovery paths produce the same `app_versions` data shape (see [Architecture](architecture.md)), so the resulting Python surface is identical regardless of source.

## See also

- [Architecture](architecture.md)
- [List and inspect apps](../howto/list-and-inspect-apps.md)
- [Run built-in app tests](../howto/run-built-in-app-tests.md)
- {py:func}`~playmolecule.describe_apps`


# === explanation/architecture.md ===

# Architecture

PlayMolecule's design separates **where apps come from** (manifest discovery) from **where jobs run** (execution). Both are pluggable. Understanding the split is the key to picking the right environment variables and predicting what a given call will do.

## Two orthogonal backend axes

```{mermaid}
graph LR
    subgraph Registries [Manifest backend]
        D[docker://&lt;registry&gt;<br/>Docker registry]
        H[http://&lt;url&gt;<br/>PlayMolecule HTTP backend]
        L[local:&lt;path&gt;<br/>filesystem]
    end
    subgraph Executors [Execution backend]
        EL[Local<br/>docker run / apptainer run<br/>--<br/>direct or via SLURM sbatch]
        EH[HTTP<br/>POST to backend]
    end
    D --> EL
    H --> EH
    L --> EL
```

*Left column: where apps are discovered (`PM_REGISTRIES`). Right column: where jobs run (`PM_EXECUTOR`).*

- **Manifest backend** is chosen by `PM_REGISTRIES`. It answers "what apps exist and what are their parameters?". One of: `docker://`, `http://`, or `local:`.
- **Execution backend** is chosen by `PM_EXECUTOR`. It answers "when I call `ed.run()`, where does the container actually start?". One of `local` (default) or an `http://` URL.

The two are independent. A common production setup mixes them — e.g., discover apps from a `docker://` registry but execute remotely through an `http://` backend; or browse an `http://` backend's catalogue locally without ever submitting through it.

## What each backend does

### Manifest backends

| `PM_REGISTRIES` prefix | When you'd use it                                                                                                | How it discovers apps                                                                       |
|------------------------|------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|
| `docker://<registry>`  | Default. Pulls Acellera's released apps from a container registry.                                               | Lists images in the configured Docker registry. Reads each image's manifest label.          |
| `http://<url>`         | When jobs are dispatched to a remote PlayMolecule backend.                                                       | Hits the backend's catalogue endpoint and decodes the JSON.                                 |
| `local:<path>`         | **Developer use only** — when you're writing your own app and want to iterate on its manifest without publishing. | Scans `<path>/apps/<name>/<version>/` for `*.json` manifests on disk.                       |

The output of all three is the same shape — a `{appname: {version: {manifest, files, run.sh}}}` dict — so the rest of the system doesn't care which one ran.

### Execution backends

| `PM_EXECUTOR` value          | How it runs jobs                                                              |
|------------------------------|-------------------------------------------------------------------------------|
| `local` (default)            | `docker run` or `apptainer run`, mounting the run directory into the container. |
| `http://<url>`               | POSTs the prepared input JSON to the backend, polls for status, downloads outputs. |
| (SLURM is a *mode* of local) | `ed.run(queue="slurm", ...)` wraps the local execution in an `sbatch` script. |

SLURM isn't a separate execution backend — it's an `sbatch` wrapper around the local execution path. SLURM workers still use Docker or Apptainer to run the container; PlayMolecule just generates the submission script.

## The flow of a single call

```{mermaid}
sequenceDiagram
    actor User
    participant App as proteinprepare(...)
    participant ED as ExecutableDirectory
    participant Exec as execution backend
    participant Container

    User->>App: proteinprepare(outdir, pdbid='3ptb')
    App->>App: validate args against manifest signature
    App->>ED: set up run_<id>/ on disk, write input JSON
    App-->>User: return ed (nothing has executed yet)

    User->>ED: ed.run()
    ED->>Exec: dispatch
    Exec->>Container: docker run / apptainer run / HTTP POST
    Container-->>Exec: outputs to outdir, exit code
    Exec-->>ED: status
    ED-->>User: control returns
```

The two-phase split — *setup* then *run* — is deliberate. It lets you:

- Inspect or tweak inputs in `outdir/run_<id>/` before launching.
- Save an `ed` reference, submit it to SLURM, and check status hours later from a fresh process.
- Batch many `ed`s into a single SLURM submission with {py:func}`~playmolecule.slurm_mps`.

## Where configuration lives

| Source                            | Reads / writes                                              |
|-----------------------------------|-------------------------------------------------------------|
| `PM_*` environment variables      | Single source of truth at import time. Listed in [Environment variables](../reference/environment-variables.md). |
| App manifest                      | Per-app parameters, default resources, expected outputs.    |
| `outdir/run_<id>/inputs.json`     | The exact inputs sent to a specific run.                    |
| `~/.cache/playmolecule/cookies/`  | HTTP backend session.                                       |
| `~/.cache/playmolecule/apptainer/`| SIF cache (Docker images converted on first use).           |

Everything user-tunable is in env vars; everything app-tunable is in the manifest; everything specific to a run is in `run_<id>/`. There is no global mutable state inside PlayMolecule itself.

## See also

- [Apps and manifests](apps-and-manifests.md)
- [Executable directory](executable-directory.md)
- [Job lifecycle](job-lifecycle.md)
- [Environment variables](../reference/environment-variables.md)


# === explanation/artifacts-and-files.md ===

# Artifacts and files

PlayMolecule has two related but distinct concepts for "things that live inside an app's container and can be used by it": **files** and **artifacts**. They share the same underlying class hierarchy but serve different purposes.

## Files: the raw inventory

Every app declares a `files` block in its manifest mapping logical paths to in-container paths:

```json
"files": {
  "tests/3ptb.pdb": "/app/files/tests/3ptb.pdb",
  "tests/web_content.pickle": "/app/files/tests/web_content.pickle"
}
```

These are exposed as `app.files` — a dict of file handles. You don't usually touch this directly. It's used internally to:

- Resolve test-config paths to actual file handles (`tests/3ptb.pdb` → the bundled PDB).
- Resolve `artifacts` entries (next section).

## Artifacts: the curated, callable surface

The `artifacts` block (also accepted as the older synonym `datasets`) declares which files are *meant to be used as inputs*:

```json
"artifacts": [
  { "name": "default", "path": "datasets/model_98acc.ckpt", "description": "DeepSite final model" }
]
```

These appear as attributes on `app.artifacts`:

```python
from playmolecule.apps import deepsite

deepsite.artifacts.default            # a callable file handle
deepsite.artifacts.default.path       # path inside the container
deepsite.artifacts.default.download("./local-copy")
```

You pass `deepsite.artifacts.default` directly as a function argument; PlayMolecule resolves it to the right path depending on the active execution backend (local mount, Docker bind, HTTP fetch).

## The summary

| Aspect         | `app.files`                                         | `app.artifacts`                                                  |
|----------------|-----------------------------------------------------|-------------------------------------------------------------------|
| Source         | `files` block in manifest                           | `artifacts` (or `datasets`) block in manifest                     |
| Access         | dict keyed by logical path                          | attribute access by curated name                                  |
| Purpose        | wiring (tests, internal resolution)                 | user-facing — pass into app calls                                 |
| Has `.name`?   | Logical path                                        | Curated short name (no dots; must start with a letter)            |
| Has description| Usually no                                          | Yes (from manifest)                                               |
| Downloadable?  | Yes (`.download()`)                                 | Yes (`.download()`)                                               |

In short: `artifacts` is `files` filtered to the entries someone took the trouble to curate and name. Reach for `artifacts` unless you know you need the raw `files` dict.

## Backend-aware file handles

A file handle knows how to fetch its content for the active backend:

- **local registry** — plain filesystem path; `.download()` is a copy.
- **Docker registry** (Docker runtime) — `.download()` shells out to `docker cp`.
- **Docker registry** (Apptainer runtime) — `.download()` runs `apptainer exec` against the cached SIF and copies out.
- **HTTP backend** — `.download()` issues an authenticated GET to the backend.

You don't pick which one you got — the handle does the right thing for the current registry/runtime, which is why example code can use `app.artifacts.Foo` uniformly across installations.

## When you'd actually use `.download()`

The `download()` path is for **outside-the-app** consumers — say, you want to do an analysis in your own notebook with the same reference data the app uses. For arguments *to* the app, just pass the handle directly; don't download first.

## Gotchas

- Two artifacts in the same app cannot share a name. If they do, the loader overwrites silently.
- `download()` to a path that already exists will overwrite a file or wipe-and-recreate a directory. There is no "keep existing" mode.
- For Docker / Apptainer files, `download()` shells out — it's slow for many small files. Prefer a single `download()` of a directory over a loop of per-file downloads.

## See also

- [Use app artifacts](../howto/use-app-artifacts.md)
- [Apps and manifests](apps-and-manifests.md)


# === explanation/executable-directory.md ===

# Executable directory

An {py:class}`~playmolecule.ExecutableDirectory` (ED) is the on-disk artefact you get back from every app call. It is the unit PlayMolecule moves around, runs, polls, and re-uses. This page explains what's inside one, why the abstraction exists, and how it composes with SLURM and HTTP backends.

## The two-phase model

A PlayMolecule app call has two distinct phases:

1. **Setup** — `proteinprepare(outdir="out", pdbid="3ptb")` validates arguments against the manifest signature, stages input files into `out/run_<timestamp>_<uuid>/`, writes the input JSON, generates a run script, and returns an {py:class}`~playmolecule.ExecutableDirectory`. **No container has started yet.**
2. **Run** — `ed.run()` (optionally `ed.run(queue="slurm", ...)`) hands the prepared directory to an execution backend. Outputs land back in `outdir`.

The split exists because the two phases benefit from different environments:

- Setup wants to be **cheap** and **local** — you might do it in a notebook on your laptop.
- Run wants to be **wherever the resources are** — your laptop, a SLURM worker, a GPU node, the HTTP backend.

Decoupling them means you can set up hundreds of EDs in a script and then submit them in a batch, replay a single ED on a different cluster, or inspect prepared inputs before paying for compute.

## Layout on disk

```text
outdir/
├── output.pdb                    # produced by the run (later)
├── details.csv                   # produced by the run (later)
├── run_03_07_2026_14_22_a1b2c3d4.sh   # the rendered run script
└── run_03_07_2026_14_22_a1b2c3d4/    # the inputs dir for this run
    ├── inputs.json                  # input JSON consumed by the container
    ├── input-files-staged-here/     # copies/symlinks of file params
    ├── .pm.alive                    # heartbeat — see Job lifecycle
    └── .pm.err                      # error sentinel (only if it failed)
```

Key properties:

- The **outdir** is the user-chosen location.
- The **run directory** has a fresh timestamp + UUID per call, so you can re-run the same ED and get parallel `run_*/` siblings.
- The **run script** lives next to the run directory; `runsh = inputs_dir.basename + ".sh"`.
- The directory is **self-contained**. If you `tar` it up, copy it to another machine, and reconstruct the ED there, `ed.run()` will work as long as the same registry/images are available.

## Reconstructing an ED from disk

```python
from playmolecule import ExecutableDirectory

ed = ExecutableDirectory(dirname="/shared/scratch/me/run")
print(ed.status)
ed.run()                # resume / re-run
```

The constructor finds the most recent `run_<id>/` inside `dirname` and uses it as the inputs directory. This is what makes "submit on Monday, check status on Tuesday" work — there's no in-memory state required.

## Execution backend dispatch

`ed.run()` dispatches to whichever execution backend was active **when the ED was built**. That means:

- Setting up under `PM_EXECUTOR=local` and later changing `PM_EXECUTOR=http://...` does not move the job. The backend was captured at setup time.
- To switch, set up a new ED in a new process.

`ed.status` follows the same dispatch — local EDs are queried by reading the heartbeat file and the SLURM queue; HTTP EDs are queried by HTTP.

## The `slurm` shortcut

`ed.run(queue="slurm", ...)` wraps the prepared run directory in a `jobqueues` SLURM submission. Resources default to the values captured from the app manifest at setup time. (`ed.slurm(...)` is a thin alias retained for backwards compatibility.)

The execution backend isn't switched to "SLURM" — SLURM is a mode of the local execution backend (it ultimately invokes the same `docker run` or `apptainer run`, just on a worker node).

## Batched MPS submission

{py:func}`~playmolecule.slurm_mps` takes a list of EDs and submits them as a single SLURM job that holds one GPU under NVIDIA MPS. The EDs are still independent on disk — each one writes to its own `outdir` — but the SLURM accounting collapses them. Resource defaults are taken from the **first** ED's `execution_resources`, not the union.

## Why not just return a dict?

You could imagine PlayMolecule returning `{"runsh": "...", "inputs_dir": "...", ...}` and dropping the class. The reasons it doesn't:

- `.status` needs to dispatch by execution backend. A dict can't do that without a wrapper.
- HTTP-backend jobs need to track their server-side job id between calls. A dict can't carry that.
- Polling code reads more naturally as `ed.status` than `ed["status"]`.

The ED is intentionally thin — almost everything it knows is in fields, and its methods (`run`, `slurm`, `status`) are dispatch shims to the active backend.

## See also

- [Architecture](architecture.md)
- [Job lifecycle](job-lifecycle.md)
- [Check job status](../howto/check-job-status.md)
- {py:class}`~playmolecule.ExecutableDirectory`


# === explanation/index.md ===

# Explanation

Concept-oriented pages. Read these when you want the mental model, not a procedure.

```{toctree}
:maxdepth: 1

architecture
apps-and-manifests
executable-directory
job-lifecycle
artifacts-and-files
```


# === explanation/job-lifecycle.md ===

# Job lifecycle

A PlayMolecule job moves through four states from submission to completion. This page describes the states, how they're detected, and the heartbeat mechanism the local backend uses to spot dead workers.

## The four states

{py:class}`~playmolecule.JobStatus` is an `IntEnum`:

```text
WAITING_INFO = 0   # Submitted, not yet running. No heartbeat seen.
RUNNING      = 1   # Container is alive and the heartbeat is fresh.
COMPLETED    = 2   # Exit code 0 (local) or backend reported success (HTTP).
ERROR        = 3   # Non-zero exit, missing outputs, or stale heartbeat.
```

`str(status)` returns a human label via {py:meth}`~playmolecule.JobStatus.describe`. Numerically comparing (`status == JobStatus.RUNNING`) is supported.

## State diagram

```{mermaid}
stateDiagram-v2
    [*] --> WAITING_INFO: ed.run()
    WAITING_INFO --> RUNNING: container starts, .pm.alive appears
    RUNNING --> COMPLETED: .pm.done written
    RUNNING --> ERROR: .pm.err written / heartbeat stale > 60s
    WAITING_INFO --> ERROR: backend reports failure before run starts
    COMPLETED --> [*]
    ERROR --> [*]
```

## Local backend: how each state is detected

The local execution backend uses three sentinel files inside `outdir/run_<id>/`:

| File         | Set by        | What it means                                                                |
|--------------|---------------|------------------------------------------------------------------------------|
| `.pm.alive`  | The container | Refreshed periodically with an ISO-format timestamp while the job is running. |
| `.pm.done`   | The container | Written on clean exit. Primary `COMPLETED` signal.                            |
| `.pm.err`    | The container | Written on a controlled failure (non-zero exit, exception).                   |

`ed.status` checks these in order:

1. `.pm.done` exists → **`COMPLETED`**.
2. `.pm.err` exists → **`ERROR`**.
3. `.pm.alive` exists:
   - timestamp within the last 60 seconds → **`RUNNING`**.
   - timestamp older than 60 seconds → **`ERROR`** (worker died without writing `.pm.err`).
4. *(SLURM-submitted job)* the SLURM queue state is consulted — see below.
5. Fallback: read the app's `expected_outputs.json`. If it lists files and they're all present on disk → **`COMPLETED`**; if some are missing → **`RUNNING`**.
6. None of the above → **`WAITING_INFO`**.

The 60-second timeout is hard-coded. A SLURM worker that crashes silently will leave `.pm.alive` stale; after 60 seconds the status flips to `ERROR` even without an explicit failure signal.

The fallback path (5) was tightened recently: an empty `expected_outputs.json` list now falls through to `WAITING_INFO` rather than reporting a false `COMPLETED` for a job that hadn't started.

## SLURM: the same, plus the queue

When a job is submitted via `ed.run(queue="slurm", ...)`, two things drive the status:

- The same `.pm.alive` / `.pm.done` / `.pm.err` files (the worker still uses them).
- The SLURM queue state from `jobInfo()` — consulted when no sentinel and no fresh heartbeat are present.

The SLURM-state-to-{py:class}`~playmolecule.JobStatus` mapping:

| SLURM state                          | PlayMolecule state |
|--------------------------------------|--------------------|
| `RUNNING`                            | `RUNNING`          |
| `COMPLETED`                          | `COMPLETED`        |
| `PENDING`, `None`                    | `WAITING_INFO`     |
| `FAILED`, `CANCELLED`, `OUT_OF_MEMORY`, `TIMEOUT` | `ERROR` |

The heartbeat catches the case where SLURM thinks the job is running but the container died silently on the worker (common with GPU driver issues).

## HTTP backend: server-side truth

For HTTP-backend jobs there's no shared filesystem. `ed.status` does a single HTTP GET against the backend's status endpoint, keyed by a job id derived from the `outdir` path at submission time. The four states are reported directly by the server.

If you move or rename `outdir` between submission and status queries, the derived job ID won't match what the server stored and you'll get a 404. The fix is to never rename `outdir`.

## Polling guidance

- **Local interactive runs** — `ed.run()` is blocking. You don't poll; you wait.
- **SLURM jobs** — poll once every 30–60 seconds. Anything faster is wasted; the controller node sees no benefit.
- **HTTP-backend jobs** — match the polling cadence to job length; once a minute is reasonable for jobs that take 10+ minutes.

Or set `PM_BLOCKING=1` and the app call itself waits until terminal state — useful in scripts where you'd otherwise write the polling loop anyway.

## Failure modes worth knowing

- **Stale heartbeat** — worker died with no `.pm.err`. Cause: usually OOM-kill or hardware fault. Check SLURM accounting (`sacct`).
- **Missing expected outputs** — manifest's `expected_outputs` list and the actual run disagree. Either the app code regressed or the inputs were unusable.
- **`WAITING_INFO` forever** — the container never started. Check the container runtime: `docker pull` of the image, Apptainer's SIF cache, gcloud auth.
- **HTTP 404 on status** — `outdir` was moved/renamed after submission. There's no recovery; resubmit.

## See also

- {py:class}`~playmolecule.JobStatus`
- {py:class}`~playmolecule.ExecutableDirectory`
- [Check job status](../howto/check-job-status.md)
- [Architecture](architecture.md)


# === howto/check-job-status.md ===

# Check job status

## Goal

Find out whether a PlayMolecule job is queued, running, finished, or failed — and wait for it to finish if needed.

## Minimal example

```python
from playmolecule import JobStatus
from playmolecule.apps import proteinprepare

ed = proteinprepare(outdir="/shared/scratch/me/run", pdbid="3ptb")
ed.run(queue="slurm", partition="normalCPU", ncpu=1, ngpu=0)

print(ed.status)         # JobStatus.WAITING_INFO immediately after submission
```

## The four states

{py:class}`~playmolecule.JobStatus` is an `IntEnum` with four members:

| State          | Value | Meaning                                                     |
|----------------|-------|-------------------------------------------------------------|
| `WAITING_INFO` | 0     | Submitted but not yet running. Default for fresh jobs.      |
| `RUNNING`      | 1     | Container has started and is making progress.               |
| `COMPLETED`    | 2     | Exited cleanly. Outputs are ready.                          |
| `ERROR`        | 3     | Exited with a non-zero status or stalled past the heartbeat timeout. |

For the lifecycle and the heartbeat mechanism, see [Job lifecycle](../explanation/job-lifecycle.md).

## Poll until done

```python
import time

while ed.status not in (JobStatus.COMPLETED, JobStatus.ERROR):
    time.sleep(30)

if ed.status == JobStatus.ERROR:
    raise RuntimeError("Job failed — check outdir/run_*/")
```

Use `time.sleep(30)` or longer for SLURM jobs; one second of polling per minute of runtime is plenty and keeps the controller node sane.

## Query status from a *different* Python process

A common pattern is to submit a job from a notebook and later check on it from a fresh shell. Rebuild the {py:class}`~playmolecule.ExecutableDirectory` by pointing it at the existing directory:

```python
from playmolecule import ExecutableDirectory

ed = ExecutableDirectory(dirname="/shared/scratch/me/run")
print(ed.status)
```

The constructor finds the latest `run_<id>/` subdirectory inside `dirname` and reattaches. This works for local, SLURM, and HTTP-backend jobs — the status query is dispatched by the active execution backend.

## Compare with state directly

{py:class}`~playmolecule.JobStatus` is an `IntEnum`, so:

```python
if ed.status == JobStatus.RUNNING:
    ...
```

works as expected. `str(ed.status)` returns a human label (`"Running"`, etc.) via {py:meth}`~playmolecule.JobStatus.describe`.

## Gotchas

- After SLURM marks a job `COMPLETED`, the worker may still be flushing buffered output. Wait a second or two before reading the result files if your script raced through.
- The `WAITING_INFO` state is also returned when the controller hasn't received any heartbeat yet — there's no way to distinguish "queued" from "I haven't heard from the worker yet" from this enum. SLURM's `squeue` is the source of truth for queue state.
- The HTTP backend derives a job ID from the `outdir` path. If you move or rename the directory after submission, status queries will fail to find the job.

## See also

- {py:class}`~playmolecule.JobStatus`
- [Job lifecycle](../explanation/job-lifecycle.md)
- [Run an app on SLURM](run-an-app-on-slurm.md)


# === howto/index.md ===

# How-to guides

Task-oriented recipes. Each page solves one concrete problem and is self-contained.

## Discovering apps

```{toctree}
:maxdepth: 1

list-and-inspect-apps
select-an-app-version
use-app-artifacts
```

## Running jobs

```{toctree}
:maxdepth: 1

pass-input-files-to-an-app
run-an-app-locally
run-an-app-on-slurm
run-many-jobs-on-one-gpu
check-job-status
run-built-in-app-tests
```

## Remote backend

```{toctree}
:maxdepth: 1

log-in-to-the-http-backend
```

## Configuration

```{toctree}
:maxdepth: 1

switch-between-docker-and-apptainer
update-installed-apps
install-apps-for-a-cluster
```


# === howto/install-apps-for-a-cluster.md ===

# Install apps for a cluster

## Goal

Stand up PlayMolecule so multiple users on a shared SLURM or HPC site can run apps against a single, centrally managed installation — without each user having to download images, log into Google Cloud, or manage their own license file.

This is the admin recipe. For the end-user setup, see [Installation](../installation.md).

## What you'll have at the end

- Apptainer installed on every compute node.
- A shared license file (or floating-license server) reachable from every node.
- A **shared SIF cache** on shared storage holding every app's container image — pulled once, used by everyone.
- An environment script users source to pick up the right `PM_*` variables.

## Prerequisites

- Acellera-issued **license file** (or coordinates for a floating-license server).
- Acellera-issued **service-account JSON** (or another way to authenticate `gcloud`) for the GCloud Artifact Registry that holds the app images.
- Shared filesystem mounted at the same path on every compute node (e.g., `/shared`).
- Root (or `sudo`) on every compute node where users will run jobs — Apptainer has to be installed on each worker.
- `gcloud` CLI installed for the admin user.

## Step 1 — Install Apptainer on every compute node

PlayMolecule pulls Docker images and converts them to plain SIF files on first use, so the only requirement is a working Apptainer:

```bash
# Debian/Ubuntu
sudo apt-get install apptainer

# RHEL/Fedora (EPEL)
sudo dnf install apptainer
```

Verify:

```bash
apptainer --version
```

Modern kernels let Apptainer run fully unprivileged via user namespaces. If your kernel doesn't have user namespaces enabled, install the `apptainer-suid` variant too — it ships a setuid helper as a fallback. Official install docs: <https://apptainer.org/docs/admin/main/installation.html>.

## Step 2 — Set up the license

Two options:

**Single machine**: drop the license file somewhere readable by every user:

```bash
sudo install -m 0644 acellera.lic /etc/acellera/acellera.lic
```

**Multi-node cluster**: run an Acellera floating-license server. Instructions:
<https://software.acellera.com/acemd/licence.html#licence-server>

## Step 3 — Authenticate to Acellera's Docker registry

```bash
gcloud auth login                                            # admin user
gcloud auth activate-service-account --key-file=/etc/playmolecule/sa.json
gcloud auth configure-docker europe-southwest1-docker.pkg.dev
```

This stores Docker-format credentials that `apptainer pull docker://...` will pick up automatically.

## Step 4 — Create a shared SIF cache

This is the heart of the cluster install. Every app's container image gets pulled into this directory once; every user on every node reads it from there.

```bash
sudo mkdir -p /shared/playmolecule/apptainer
sudo chmod a+rwx /shared/playmolecule/apptainer
```

The path can be anywhere on shared storage; pick whatever your site convention is.

## Step 5 — Pull the app images

Tell PlayMolecule to use Apptainer and point it at the shared cache, then pull everything:

```bash
export PM_RUNTIME=apptainer
export PM_SIF_CACHE_DIR=/shared/playmolecule/apptainer
```

```python
from playmolecule import update_apps

update_apps(pull_new=True, interactive=True)
```

The default `PM_REGISTRIES` (Acellera's GCloud Docker registry) is exactly what you want here — no need to set it.

`interactive=True` prints a numbered list of every available app. Type `all` to pull everything, or pick specific apps. Each one converts to a SIF in `/shared/playmolecule/apptainer/`.

## Step 6 — Write an environment script

Drop something like this into `/etc/profile.d/playmolecule.sh` (or your site's equivalent module file):

```bash
# PlayMolecule shared install
export PM_RUNTIME=apptainer
export PM_SIF_CACHE_DIR=/shared/playmolecule/apptainer
export ACELLERA_LICENCE_SERVER=27000@license.example.com    # floating-license server
```

Users who source this need only `pip install playmolecule` and they're ready to go — they pick up Acellera's default Docker registry, the shared SIF cache, and the licence automatically.

## Step 7 — Smoke-test

As any non-admin user:

```python
from playmolecule import describe_apps
from playmolecule.apps import proteinprepare

describe_apps()                          # should list every installed app
proteinprepare.tests.simple.run()        # should complete with 🎉
```

If {py:func}`~playmolecule.describe_apps` is empty, the user can't reach the Docker registry — check `gcloud auth configure-docker` and the user's Docker credentials store. If `tests.simple.run()` fails before producing outputs, the most likely cause is the license file path or Apptainer being misconfigured (no `apptainer-suid` on a kernel without user namespaces).

## Refresh later

Re-run from the admin account whenever Acellera publishes updates:

```python
from playmolecule import update_apps
update_apps(pull_new=True)
```

Cron it nightly if you want automatic updates:

```text
0 3 * * * PM_RUNTIME=apptainer PM_SIF_CACHE_DIR=/shared/playmolecule/apptainer \
    /opt/playmolecule/.venv/bin/python -c \
    "from playmolecule import update_apps; update_apps(pull_new=True)"
```

## Gotchas

- A floating-license server is required if more than one machine will run apps at the same time. Otherwise you'll hit license conflicts.
- If `apptainer pull docker://...` fails on a hardened node, the kernel may not have user namespaces enabled. The fix is either to enable them or to install the `apptainer-suid` package as a fallback.
- The `$PM_SIF_CACHE_DIR` directory grows. Several GB per app is normal. Reclaim by removing old `*.sif` files; they'll be re-fetched on next use.
- The `gcloud auth configure-docker` step has to be reachable by *every user* who'll run jobs — either by having each user run it themselves, or (cleaner) by sharing a `~/.docker/config.json` site-wide. Without it users will see "unauthorized" when the first job tries to pull a SIF.

## See also

- [Update installed apps](update-installed-apps.md)
- [Switch between Docker and Apptainer](switch-between-docker-and-apptainer.md)
- [Environment variables](../reference/environment-variables.md)


# === howto/list-and-inspect-apps.md ===

# List and inspect apps

## Goal

Find out which PlayMolecule apps are installed, what each one does, and what parameters it accepts.

## Minimal example

```python
from playmolecule import describe_apps

describe_apps()
```

Prints each app's qualified import path and one-line description.

## Get the data as a dict

```python
apps = describe_apps(as_dict=True)
for path, info in apps.items():
    print(path, "—", info["description"])
```

The keys are fully qualified import paths (`playmolecule.apps.<name>.<version>.<function>`); the values currently expose only `description`. Use this when you need to drive a UI or a config file rather than print to a terminal.

## Inspect one app

Once imported, each app is a normal Python function with a real signature and docstring:

```python
from playmolecule.apps import proteinprepare

help(proteinprepare)              # parameters + outputs + examples
```

In IPython, `proteinprepare?` is equivalent. The docstring lists every parameter, its type, default, and description, plus the outputs the app writes and any example calls — everything you need to call the app correctly.

## See available versions

The versioned submodules live under the app's namespace:

```python
import playmolecule.apps.proteinprepare as pp
[name for name in dir(pp) if name.startswith("v")]   # ['v1', 'v2', ...]
```

The bare `proteinprepare` symbol aliases the latest version; everything else is at `pp.v1`, `pp.v2`, etc. See [Select an app version](select-an-app-version.md).

## Gotchas

- `describe_apps` only sees apps that loaded successfully at import time. If `PM_REGISTRIES` is unreachable (network down, wrong credentials) the function returns nothing — check the `playmolecule` logger output, or set `PM_LOG_LEVEL=DEBUG` to see why.
- A custom app whose manifest fails to parse is silently skipped at import time and logged as an error. `describe_apps` won't list it; check the import-time log to spot the problem.
- The descriptions you see are the first line of each app function's docstring. If you maintain an app, keep the first line tight.

## See also

- {py:func}`~playmolecule.describe_apps`
- [Select an app version](select-an-app-version.md)
- [Apps and manifests](../explanation/apps-and-manifests.md)


# === howto/log-in-to-the-http-backend.md ===

# Log in to the HTTP backend

## Goal

Authenticate against a PlayMolecule HTTP backend so app calls submit jobs through the backend instead of running locally.

## Minimal example

```bash
playmolecule login --email you@example.com
```

You'll be prompted for a password (or read it from `PM_PASSWORD` / `PLAYMOLECULE_PASSWORD`). On success, a cookie is written to `~/.cache/playmolecule/cookies/` and reused by every subsequent call.

## From Python

```python
from playmolecule import login, logout

login("you@example.com", "•••••••")
# ... use playmolecule normally ...
logout()
```

`login()` raises if `PM_EXECUTOR` (or `PM_REGISTRIES` for an `http://` registry) doesn't point at a valid HTTP URL.

## Configure where to log in

The HTTP URL comes from your environment:

```bash
export PM_REGISTRIES=http://playmolecule.example.com
export PM_EXECUTOR=http://playmolecule.example.com    # often the same URL
```

`PM_REGISTRIES` tells PlayMolecule where to fetch manifests; `PM_EXECUTOR` tells it where to submit jobs. They're independent — you can browse an HTTP registry but execute locally, or vice versa.

## Non-interactive login (CI)

```bash
export PLAYMOLECULE_EMAIL=ci@example.com
export PLAYMOLECULE_PASSWORD='••••••••••'
playmolecule login
```

Or the `PM_*` aliases:

```bash
export PM_EMAIL=ci@example.com
export PM_PASSWORD='••••••••••'
playmolecule login
```

## Log out

```bash
playmolecule logout
```

Clears the session cookie locally. Equivalent in Python:

```python
from playmolecule import logout
logout()
```

## Custom cookie cache location

By default cookies sit under `~/.cache/playmolecule/cookies/`. Move them by setting:

```bash
export PM_COOKIE_CACHE_DIR=/secure/path/playmolecule-cookies
```

Useful in containerised CI runners where `$HOME` is ephemeral.

## Gotchas

- The backend issues a CSRF token before accepting credentials; if the first request fails with an HTTP error, the URL is almost certainly wrong or the backend isn't reachable.
- A logged-in session is persisted per host. Different machines need their own `login`.
- `login` doesn't validate the credentials beyond what the server reports — if you mistyped the password you'll see the failure surface as an HTTP 401 from the next API call, not the login itself.

## See also

- {py:func}`~playmolecule.login`
- {py:func}`~playmolecule.logout`
- [Environment variables](../reference/environment-variables.md)
- [CLI](../reference/cli.md)


# === howto/pass-input-files-to-an-app.md ===

# Pass input files to an app

## Goal

Give an app one of your own files (a PDB, an SDF, a directory of trajectories, …) as a parameter.

## Minimal example

```python
from playmolecule.apps import proteinprepare

ed = proteinprepare(
    outdir="out",
    pdbfile="./inputs/3ptb.pdb",
)
ed.run()
```

## How file parameters are handled

When you pass a string or `Path` for a parameter typed `Path` in the manifest, PlayMolecule:

1. Resolves it to an absolute path on your host.
2. Copies (or symlinks — see below) it into `outdir/run_<id>/` under the same basename.
3. Rewrites the input JSON so the in-container path points to the staged copy.

Result: the container sees the file at a predictable path; you keep the original. Trying to pass a path that doesn't exist raises immediately, before the container starts.

## Copy vs symlink

By default PlayMolecule copies inputs into the run directory. The copy is what makes a run directory **reproducible**: once `outdir/run_<id>/` is built, it contains every byte the container will ever read, so you can `tar` it up, archive it alongside published results, or replay it months later — even if the originals on your host have moved, changed, or been deleted.

The trade-off is speed: copying multi-GB trajectories is slow. Set `PM_SYMLINK=1` to symlink instead:

```bash
export PM_SYMLINK=1
```

With symlinks the run directory is **not** self-contained: it depends on the originals staying where they were when you called the app. If you delete or move the source files, the run directory's inputs become dangling links. Use symlinks for fast iteration on large inputs; keep the default (copy) when reproducibility matters more than I/O.

Also don't use symlinks when `outdir` is on a different filesystem than the source — some container runtimes won't follow cross-mount symlinks.

## Pass a directory

If the app parameter is typed `Path` and you give it a directory, the whole tree is staged the same way:

```python
ed = some_app(outdir="out", trajdir="./trajectories/run42")
```

Pair with `PM_SYMLINK=1` if the directory is large.

## Pass a file that's already an app artifact

When the file is bundled with the app (a trained model, a reference dataset), don't copy it manually — use the artifact handle directly:

```python
from playmolecule.apps import deepsite

deepsite(outdir="out", pdbid="3ptb", model=deepsite.artifacts.default).run()
```

See [Use app artifacts](use-app-artifacts.md).

## Gotchas

- The string `"."` and relative paths are resolved against the current working directory at the time of the *call*, not at the time of `run()`. If you change directories between the two, you'll surprise yourself.
- For SLURM, the staged path must be readable by the compute node. That means `outdir` and (without symlinks) the original input must live on shared storage.
- Some apps take parameters typed `dict` whose values reference files — for example `proteinprepare`'s `residue_smiles`. Those are not staged; the *strings* go into the input JSON as-is.

## See also

- [Run an app locally](run-an-app-locally.md)
- [Run an app on SLURM](run-an-app-on-slurm.md)
- [Use app artifacts](use-app-artifacts.md)


# === howto/run-an-app-locally.md ===

# Run an app locally

## Goal

Execute a PlayMolecule app on the current machine, using Docker or Apptainer as the container runtime.

## Minimal example

```python
from playmolecule.apps import proteinprepare

ed = proteinprepare(outdir="out", pdbid="3ptb")
ed.run()
```

When `ed.run()` returns, the job is done. `ed.status` is `JobStatus.COMPLETED` (or `JobStatus.ERROR`).

## What `run()` does

1. Asks the active execution backend (local, by default) to run the app.
2. The local backend invokes `docker run` or `apptainer run` against the cached image, bind-mounting the run directory into the container.
3. Stdout/stderr stream to your terminal.
4. The function returns when the container exits.

For the conceptual picture, see [Architecture](../explanation/architecture.md).

## Quiet down logs

```python
ed.run(verbose=False)
```

`verbose=False` suppresses live output but still writes everything to the job's log file at `outdir/run_<id>.log` (a sibling of the run directory, not inside it).

## Set the container runtime

Docker is the default. Switch to Apptainer for HPC nodes that can't run Docker:

```bash
export PM_RUNTIME=apptainer
```

See [Switch between Docker and Apptainer](switch-between-docker-and-apptainer.md) for the full picture.

## Run the same `ed` somewhere else

`ed.run(...)` re-uses the same prepared inputs. If `run()` errored once, you can fix the issue (e.g., adjust a file in `outdir/`) and call `run()` again without rebuilding the job.

To submit the same `ed` to SLURM instead:

```python
ed.run(queue="slurm", partition="normalCPU", ncpu=1, ngpu=0)
```

See [Run an app on SLURM](run-an-app-on-slurm.md).

## One-shot syntax

If you don't need the `ed` handle:

```python
proteinprepare(outdir="out", pdbid="3ptb").run()
```

Drop the binding when running interactively. Keep it whenever you need to check status, batch jobs, or inspect inputs.

## Gotchas

- The first time you call an app, the container image must be pulled — that's a one-time multi-hundred-MB download. If the pull fails, check `docker login` / `gcloud auth configure-docker` for the registry (see [Install apps for a cluster](install-apps-for-a-cluster.md)).
- `outdir` must be writable by the user running the Python script. Docker's UID-mapping quirks are out of scope here, but if you see permission errors in the output files, run `docker info` and check your storage driver.
- Setting `PM_BLOCKING=1` makes the call **block until the job finishes** even when running through the HTTP backend — useful for scripts, never useful for interactive work.

## See also

- {py:meth}`~playmolecule.ExecutableDirectory.run`
- [Run an app on SLURM](run-an-app-on-slurm.md)
- [Check job status](check-job-status.md)
- [Architecture](../explanation/architecture.md)


# === howto/run-an-app-on-slurm.md ===

# Run an app on SLURM

## Goal

Submit a PlayMolecule job to a SLURM cluster and let it run asynchronously.

## Minimal example

```python
from playmolecule.apps import proteinprepare

ed = proteinprepare(
    outdir="/shared/scratch/me/proteinprepare-3ptb",
    pdbid="3ptb",
)
ed.run(queue="slurm", partition="normalCPU", ncpu=1, ngpu=0)
```

`outdir` must be on a filesystem visible to all SLURM nodes. The call returns immediately; the job runs on a worker.

## Parameters that matter

Pass `queue="slurm"` to {py:meth}`~playmolecule.ExecutableDirectory.run`; every other keyword is forwarded to the SLURM submission:

| Parameter      | Type                | What it does                                                                                                          |
|----------------|---------------------|-----------------------------------------------------------------------------------------------------------------------|
| `partition`    | `str` or `list[str]`| Queue to run on. Pass a list and the queue offering earliest start is used.                                           |
| `ncpu`         | `int`               | CPUs requested. Defaults to the app manifest's `resources.ncpu`.                                                      |
| `ngpu`         | `int`               | GPUs requested. Defaults to the app manifest's `resources.ngpu`.                                                      |
| `memory`       | `int`               | RAM in MiB.                                                                                                           |
| `gpumemory`    | `int`               | Minimum GPU memory in MiB (requires `gpu_mem` SLURM feature).                                                         |
| `walltime`     | `int`               | Timeout in seconds.                                                                                                   |
| `priority`     | `str`               | SLURM priority class.                                                                                                 |
| `jobname`      | `str`               | Job identifier shown in `squeue`.                                                                                     |
| `nodelist`     | `list[str]`         | Whitelist of nodes — **jobs will be duplicated** across them, not load-balanced.                                      |
| `exclude`      | `list[str]`         | Blacklist of nodes.                                                                                                   |
| `envvars`      | `str`               | Comma-separated env vars to propagate from the submit node to the worker.                                             |
| `prerun`       | `list[str]`         | Shell commands run on the worker before the container starts (e.g., `module load apptainer`).                         |
| `mailtype`     | `str`               | `BEGIN,END,FAIL,...` — what to email on.                                                                              |
| `mailuser`     | `str`               | Email address for `mailtype`.                                                                                         |
| `outputstream` | `str`               | SLURM stdout file path.                                                                                               |
| `errorstream`  | `str`               | SLURM stderr file path.                                                                                               |

When `ncpu` / `ngpu` aren't passed explicitly, PlayMolecule reads them from the app manifest's resource defaults. Override only when you want to deviate from them.

## Preset the queue from the environment

Set the queue config once and `ed.run()` with no arguments will route to SLURM automatically:

```bash
export PM_QUEUE_CONFIG='{"queue": "slurm", "cpu_partition": "normalCPU", "gpu_partition": "normalGPU"}'
```

```python
ed.run()    # picks gpu_partition if the manifest requests GPUs, cpu_partition otherwise
```

Other keys in the JSON pass through as kwargs (e.g., `memory`, `walltime`).

## Check on the job

```python
print(ed.status)        # JobStatus.WAITING_INFO / RUNNING / COMPLETED / ERROR
```

See [Check job status](check-job-status.md) for the polling pattern.

## Gotchas

- `/tmp/` is *not* shared. If you set `outdir=/tmp/...` your job will start and immediately fail when the worker can't read the inputs. Use shared storage.
- Logs go to wherever SLURM was configured to write them (and to `outdir/run_<id>/`). Use `--output` / `outputstream` to override.
- The submitting Python process does not need to stay alive — the job is owned by SLURM. Status queries work from any process by reconstructing the {py:class}`~playmolecule.ExecutableDirectory` from `dirname`.

## Side note: `ed.slurm(...)`

`ed.slurm(partition=..., ncpu=..., ...)` is a thin alias for `ed.run(queue="slurm", ...)` retained for backwards compatibility. New code should prefer `run(queue="slurm")` so the same call style works for local, SLURM, and HTTP backends, and so `PM_QUEUE_CONFIG` can drop the kwargs entirely.

## See also

- {py:meth}`~playmolecule.ExecutableDirectory.run`
- {py:meth}`~playmolecule.ExecutableDirectory.slurm`
- [Run many jobs on one GPU](run-many-jobs-on-one-gpu.md)
- [Check job status](check-job-status.md)


# === howto/run-built-in-app-tests.md ===

# Run built-in app tests

## Goal

Execute the integration tests that ship inside each PlayMolecule app, to confirm the app and its environment are working before you build real jobs around it.

## Minimal example

```python
from playmolecule.apps import proteinprepare

print(proteinprepare.tests)               # list available tests
proteinprepare.tests.simple.run()         # run one
```

Each named test runs the app end-to-end in a temporary directory, waits for completion, and asserts that the manifest's `expected_outputs` files appear.

## Discover tests

```python
print(proteinprepare.tests)
```

Each entry has a description, arguments, and an `expected_outputs` list:

```text
[simple] 'Prepare 3PTB structure from RCSB'
- Arguments:
  pdbid = 3ptb
- Expected outputs:
  output.pdb
  pka_plot.png
  details.csv

[reprotonation] 'Prepare 3PTB but reprotonate specific residues'
...
```

Tests come from the app manifest's `tests` block.

## Run locally

```python
proteinprepare.tests.simple.run()
```

The test runs in a `tempfile.TemporaryDirectory` and prints a `🎉 Test '<name>' succeeded in N seconds! 🎉` line on success. Any missing expected output raises `RuntimeError`.

## Run on SLURM

```python
proteinprepare.tests.simple.run(
    queue="slurm",
    dir="/shared/scratch/tests/",
    partition="normalCPU",
    ncpu=1,
    ngpu=0,
)
```

When `queue="slurm"`, you **must** pass `dir=` pointing to a path visible from the worker nodes — the default temp directory under `/tmp/` won't be there.

All extra kwargs are forwarded to {py:meth}`~playmolecule.ExecutableDirectory.run`, which forwards SLURM-specific ones to the queue.

## Pin a test to a specific version

To detect manifest drift across upgrades, call the test on a pinned version:

```python
proteinprepare.v1.proteinprepare.tests.simple.run()
```

## Gotchas

- Tests don't return anything — they raise on failure. Wrap in `try/except` if you're driving them from a larger script.
- Test names that don't start with a letter, or contain characters other than `[A-Za-z0-9_]`, are renamed at load time (`-` becomes `_`, leading digit becomes `test_<n>`). Use `dir(app.tests)` to see the actual attribute names.
- Test failures will leave the temp directory deleted (it's a context manager). If you need to debug, pass `dir="./debug-runs/"` and inspect the leftover contents — but note that for `queue=None` (local) the cleanup still runs.

## See also

- {py:class}`~playmolecule.JobStatus`
- [Using app versions and tests](../tutorials/02-using-app-versions-and-tests.md)
- [Apps and manifests](../explanation/apps-and-manifests.md)


# === howto/run-many-jobs-on-one-gpu.md ===

# Run many jobs on one GPU

## Goal

Pack several PlayMolecule jobs onto a single GPU on a SLURM cluster using NVIDIA MPS, so a small workload doesn't waste a whole device.

## Minimal example

```python
from playmolecule import slurm_mps
from playmolecule.apps import deepsite

eds = [
    deepsite(outdir=f"./run-{i}", pdbfile=f"protein_{i}.pdb")
    for i in range(8)
]

slurm_mps(eds, partition="normalGPU", ncpu=1, ngpu=1)
```

Every {py:class}`~playmolecule.ExecutableDirectory` in `eds` is submitted as a single SLURM job — via {py:func}`~playmolecule.slurm_mps` — that holds one GPU and shares it across the jobs through NVIDIA's Multi-Process Service.

## When to use it

- Small GPU jobs (a few seconds to a few minutes) where individual SLURM submissions would burn more time queueing than running.
- Workloads where one GPU has plenty of memory for several processes — e.g., parameter sweeps over the same model.

Don't use MPS for jobs that are individually GPU-saturating: they'll just serialise without benefit.

## Parameters

`slurm_mps(exec_dirs, **kwargs)` accepts the same SLURM kwargs as `ed.run(queue="slurm", ...)` — `partition`, `ncpu`, `ngpu`, `memory`, `walltime`, `nodelist`, `exclude`, `envvars`, `prerun`, the mail options, and the stream options. See [Run an app on SLURM](run-an-app-on-slurm.md) for the table.

The resource defaults come from the **first** {py:class}`~playmolecule.ExecutableDirectory` in the list (its `execution_resources`, which were copied from the app manifest at setup time). If you mix apps with different defaults, pass `ncpu` / `ngpu` explicitly to be safe.

## Gotchas

- All `ExecutableDirectory`s must live on a shared filesystem (same rule as plain SLURM).
- MPS needs to be enabled on the chosen partition's nodes. Talk to your cluster admin if `slurm_mps` jobs fail with "Failed to start MPS" in their logs.
- The single SLURM job runs to completion when *all* batched jobs finish; one slow job holds the GPU for the others. Group by expected runtime.

## See also

- {py:func}`~playmolecule.slurm_mps`
- [Run an app on SLURM](run-an-app-on-slurm.md)


# === howto/select-an-app-version.md ===

# Select an app version

## Goal

Call a specific version of a PlayMolecule app, instead of the latest-by-default.

## Minimal example

```python
from playmolecule.apps import proteinprepare

# Always whatever version is installed as latest
ed = proteinprepare(outdir="out", pdbid="3ptb")

# Pinned to v1
ed = proteinprepare.v1.proteinprepare(outdir="out", pdbid="3ptb")
```

## Why

The unqualified app symbol (`proteinprepare`) is an alias for the latest installed version. That's convenient for exploration, but it means an {py:func}`~playmolecule.update_apps` call can change your script's behavior without your code changing. For anything reproducible — published results, CI pipelines, batch sweeps — call the version explicitly.

## How versions are exposed

When PlayMolecule discovers an app, it builds a submodule per version:

```text
playmolecule.apps.<appname>             # alias for latest
playmolecule.apps.<appname>.v1          # explicit version 1
playmolecule.apps.<appname>.v2          # explicit version 2
...
```

Each version submodule exposes the same set of callable functions plus `artifacts`, `datasets`, `files`, and `__manifest__`. The `tests` namespace lives on each function (e.g. `proteinprepare.v1.proteinprepare.tests`), not on the submodule itself. Parameters and defaults can differ between versions — that's the whole point of versioning.

## List versions installed

```python
import playmolecule.apps.proteinprepare as pp
[name for name in dir(pp) if name.startswith("v") and name[1:].isdigit()]
```

Or look at the qualified paths in {py:func}`~playmolecule.describe_apps`:

```text
ProteinPrepare playmolecule.apps.proteinprepare.v1.proteinprepare
ProteinPrepare playmolecule.apps.proteinprepare.v2.proteinprepare
```

## Compare versions before upgrading

```python
help(proteinprepare.v1.proteinprepare)
help(proteinprepare.v2.proteinprepare)
```

Each version's function carries its own manifest-derived docstring, so `help` shows you exactly what differs between them — useful before an upgrade.

## Gotchas

- The "latest" alias is computed at import time by natural sort of version strings. `v10` correctly sorts after `v9`.
- Pinning to a version that isn't installed raises `AttributeError`. Catch it explicitly if your script must run against installs of unknown vintage.
- Pinning a version doesn't pin the container image's runtime dependencies, only the manifest contract. Image SHAs change when Acellera ships fixes; combine version pinning with [`update_apps()`](update-installed-apps.md) discipline for full reproducibility.

## See also

- [Apps and manifests](../explanation/apps-and-manifests.md)
- [Update installed apps](update-installed-apps.md)
- {py:func}`~playmolecule.describe_apps`


# === howto/switch-between-docker-and-apptainer.md ===

# Switch between Docker and Apptainer

## Goal

Choose whether PlayMolecule executes app containers with Docker or Apptainer.

## Minimal example

```bash
export PM_RUNTIME=docker      # default
# or
export PM_RUNTIME=apptainer
```

The setting takes effect on the next `import playmolecule`.

## When to use which

| Runtime    | Choose when                                                                                          |
|------------|------------------------------------------------------------------------------------------------------|
| `docker`   | You're on a workstation or VM. Your user can `docker run` without `sudo`.                            |
| `apptainer`| You're on a shared HPC node where the cluster policy forbids Docker, or you can't get root help to add yourself to the `docker` group. |

For SLURM workers, Apptainer is almost always the only viable choice.

## What changes under the hood

- **`docker`** — PlayMolecule shells out to the `docker` CLI, pulling images from the registry into the daemon's image store. Containers are launched with `docker run`.
- **`apptainer`** — PlayMolecule converts each Docker image into a plain SIF file once (via `apptainer pull docker://...`, cached under `PM_SIF_CACHE_DIR`, default `~/.cache/playmolecule/apptainer/`), then runs `apptainer run` against the SIF.

The image content is identical; the runtime and packaging differ.

## Apptainer prerequisites

You need a working `apptainer`. On modern kernels with user namespaces enabled, that's all — Apptainer runs unprivileged. On older or hardened kernels you may also need the `apptainer-suid` package, which provides a setuid helper as a fallback.

```bash
apptainer --version
```

Official install instructions: <https://apptainer.org/docs/admin/main/installation.html>.

## Custom SIF cache location

```bash
export PM_SIF_CACHE_DIR=/scratch/$USER/playmolecule-sif
```

Move the cache off `$HOME` if you're running on a node with a tiny home quota — SIF files for the larger ML apps can be several GB.

**On a multi-node cluster, point `PM_SIF_CACHE_DIR` at a path on shared storage** (e.g., `/shared/playmolecule/apptainer/`). The first worker to need an image converts it from Docker once; every subsequent worker — and every other user — reads the same SIF directly. Without a shared cache, every node pulls a fresh copy on first use, which can mean tens of GB of redundant network traffic per app rollout.

## Extracting an artifact in Apptainer mode

[Use app artifacts](use-app-artifacts.md) works in either runtime. With Apptainer, `artifact.download()` shells out to `apptainer exec` against the cached SIF; the SIF must exist locally first (it's created automatically the first time you call the app).

## Gotchas

- The two runtimes don't share an image cache. Switching mid-workflow re-downloads everything once.
- If `apptainer pull docker://...` fails on a hardened node, the kernel may not support user namespaces. Either enable them or install the `apptainer-suid` package as a fallback. PlayMolecule itself never calls `sudo`.
- `docker` mode requires the daemon to be running and the user to be in the `docker` group (or to have sudo without password).

## See also

- [Installation](../installation.md)
- [Install apps for a cluster](install-apps-for-a-cluster.md)
- [Environment variables](../reference/environment-variables.md)


# === howto/update-installed-apps.md ===

# Update installed apps

## Goal

Refresh the locally cached container images (and, for `local:` registries, the on-disk app payloads) so installed apps match what's published in the registry.

## Minimal example

```python
from playmolecule import update_apps

update_apps()
```

Pulls newer versions of every image that's already cached locally, and re-syncs any local registry's `acellera-protocols.zip` if needed.

## Pull every app, including ones you've never used

By default {py:func}`~playmolecule.update_apps` only updates images that are already in the local Docker / SIF cache. Pull everything published in the registry:

```python
update_apps(pull_new=True)
```

Useful right after installing PlayMolecule on a new machine if you want all apps available offline.

## Pick interactively

```python
update_apps(interactive=True)
```

Prints a numbered table of every available app with `✓ Installed` / `✗ Not installed` markers, then prompts:

```text
Options:
  - Enter numbers separated by spaces to select specific apps (e.g., '1 3 5')
  - Enter 'all' to select all apps
  - Enter 'installed' to select only installed apps
  - Enter 'new' to select only non-installed apps
  - Press Enter to cancel
```

`interactive=True` implies `pull_new=True`.

## Pass a service-account JSON (for `local:` registries)

If your `PM_REGISTRIES` includes a `local:` URI backed by Acellera-provided GCS payloads, point at the service-account JSON Acellera issued you:

```python
update_apps(service_acc_json="/etc/playmolecule/sa.json")
```

For pure `docker://` registries the JSON is not needed — Docker handles auth via `gcloud auth configure-docker`.

## Combine forms

```python
# Pull every app and reauth GCS with a service account in one call
update_apps(service_acc_json="/etc/playmolecule/sa.json", pull_new=True)
```

## When to run it

- After Acellera notifies you of a new release.
- After installing PlayMolecule on a new node and you want every app available without waiting for first-use pull.
- As a recurring cron job on a shared install (paired with [`PM_REGISTRIES=local:/opt/playmolecule`](install-apps-for-a-cluster.md) so every user benefits without each running update themselves).

## Gotchas

- Updating an image may change its parameters or output names. Pin versions in scripts (see [Select an app version](select-an-app-version.md)) before running `update_apps` against a production install.
- The function logs progress per image. For 30+ apps the output is long; redirect to a log file in cron use.
- `update_apps()` does not delete obsolete images. Reclaim space with `docker image prune` (Docker) or by removing old `*.sif` files from `PM_SIF_CACHE_DIR` (Apptainer).

## See also

- {py:func}`~playmolecule.update_apps`
- [Install apps for a cluster](install-apps-for-a-cluster.md)
- [Select an app version](select-an-app-version.md)


# === howto/use-app-artifacts.md ===

# Use app artifacts

## Goal

Pass a model file, dataset, or other file that ships *with* an app — for example, DeepSite's trained model — as an argument to that app.

## Minimal example

```python
from playmolecule.apps import deepsite

print(deepsite.artifacts)         # show what's bundled with this app

deepsite(
    outdir="out",
    pdbid="3ptb",
    model=deepsite.artifacts.default,
).run()
```

`deepsite.artifacts.default` is a file handle bundled with the app. You pass it where the app expects a file path; PlayMolecule resolves it to the right location in the container.

## How artifacts work

App manifests can declare a list of `artifacts` (sometimes called `datasets` — they're synonymous). PlayMolecule attaches them to the app module as attributes named after each artifact:

```python
deepsite.artifacts                       # container of bundled file handles
deepsite.artifacts.default               # one specific artifact
deepsite.artifacts.default.path          # path inside the container
deepsite.artifacts.default.description
```

Each artifact is also addressable for a specific version: `deepsite.v1.artifacts.<NAME>`.

## Download an artifact to disk

If you want to keep an app artifact outside its container:

```python
local_path = deepsite.artifacts.default.download("./models/deepsite-default.ckpt")
```

Whether this copies from a local install, extracts from a Docker / Apptainer image, or downloads from an HTTP backend depends on which registry the app came from — the call is the same in all cases.

## Gotchas

- Artifact names must start with a letter and contain no dots. The loader skips invalid names silently — if `deepsite.artifacts.X` doesn't exist, check the spelling in the app manifest.
- The handle is *not* a string path. Don't `str(...)` it before passing it in; PlayMolecule does the resolution itself based on the active execution backend.
- A pinned-version artifact (`deepsite.v1.artifacts.X`) may not exist in `deepsite.v2.artifacts` — artifacts are versioned together with the app.

## See also

- [Apps and manifests](../explanation/apps-and-manifests.md)
- [Artifacts and files](../explanation/artifacts-and-files.md)
- [Pass input files to an app](pass-input-files-to-an-app.md)


# === index.md ===

# PlayMolecule

PlayMolecule is a Python API for running Acellera's containerized drug-discovery apps. You install one Python package, point it at a registry of apps (a Docker registry, an HTTP backend, or a local install), and call each app as a normal Python function. Each call produces a self-contained {py:class}`~playmolecule.ExecutableDirectory` you can {py:meth}`~playmolecule.ExecutableDirectory.run` locally, submit to SLURM, or execute on a remote PlayMolecule HTTP backend.

:::{important}
The `playmolecule` Python client (this package) is freely available, but the **apps** it runs — ProteinPrepare, DeepSite, Parameterize, and the rest — are commercial products that require an Acellera licence. [Contact Acellera](https://www.acellera.com/contact-us) to book a demo and obtain a quote tailored to your needs. See [Licensing](installation.md#licensing) for details.
:::

::::{grid} 2
:gutter: 3

:::{grid-item-card} 🎓 Tutorials
:link: tutorials/index
:link-type: doc

Step-by-step lessons. Start here if you're new.
:::

:::{grid-item-card} 🛠 How-to guides
:link: howto/index
:link-type: doc

Task-focused recipes. "How do I X?"
:::

:::{grid-item-card} 📖 Reference
:link: reference/index
:link-type: doc

API, CLI, and environment variables.
:::

:::{grid-item-card} 💡 Explanation
:link: explanation/index
:link-type: doc

Concepts and mental models.
:::

::::

## Installation

```bash
pip install playmolecule
```

See [Installation](installation.md) for the rest of the setup (registry credentials, container runtime). Cluster administrators rolling out PlayMolecule for a multi-user site should read [Install apps for a cluster](howto/install-apps-for-a-cluster.md).

## Quick start

```python
from playmolecule import describe_apps
from playmolecule.apps import proteinprepare

describe_apps()                                      # list what's available
ed = proteinprepare(outdir="out", pdbid="3ptb")      # build an ExecutableDirectory
ed.run()                                             # execute locally
print(ed.status)                                     # JobStatus.COMPLETED
```

The same `ed` can be submitted to SLURM instead:

```python
ed = proteinprepare(outdir="out", pdbid="3ptb")
ed.run(queue="slurm", partition="normalCPU", ncpu=1, ngpu=0)
```

## Citing

If you use PlayMolecule in published work, please cite Acellera. See <https://www.acellera.com/playmolecule>.

```{toctree}
:maxdepth: 1
:hidden:

installation
tutorials/index
howto/index
explanation/index
reference/index
```


# === installation.md ===

# Installation

PlayMolecule is one `pip install` away. The package itself is small — what you also need is a *registry* of apps to call (a Docker registry, an HTTP backend, or a local on-disk install) and a *container runtime* to execute them.

This page covers the end-user setup. If you're rolling PlayMolecule out for a multi-user cluster, see [Install apps for a cluster](howto/install-apps-for-a-cluster.md).

## 1. Install the Python package

```bash
pip install playmolecule
```

The package requires Python 3.10+. For an isolated environment, create one with `venv`, `conda`, or `uv` first:

```bash
python -m venv .venv
source .venv/bin/activate
pip install playmolecule
```

## 2. Pick a container runtime

PlayMolecule apps run as containers. Set `PM_RUNTIME` to one of:

| Runtime    | When to use                                                                       |
|------------|-----------------------------------------------------------------------------------|
| `docker`   | Default. You have Docker installed and your user can run `docker` without `sudo`. |
| `apptainer`| HPC / shared cluster nodes where Docker isn't available.                          |

```bash
export PM_RUNTIME=docker     # or apptainer
```

See [Switch between Docker and Apptainer](howto/switch-between-docker-and-apptainer.md) for the practical differences.

## 3. Choose a registry

The registry is *where* PlayMolecule finds apps. By default, PlayMolecule points at Acellera's GCloud-hosted Docker registry:

```text
docker://europe-southwest1-docker.pkg.dev/repositories-368911
```

You authenticate to that registry the same way you would for any private Docker registry — usually `gcloud auth configure-docker`. Acellera provides a service-account JSON to customers; see [Install apps for a cluster](howto/install-apps-for-a-cluster.md) for the full handshake.

To override the default registry set `PM_REGISTRIES`:

```bash
export PM_REGISTRIES=http://playmolecule.example.com  # remote HTTP backend
export PM_REGISTRIES=local:/path/to/my-apps           # developer use: iterate on your own app manifests on disk
```

See [Environment variables](reference/environment-variables.md) for the full list of `PM_*` settings.

## 4. (Optional) Log in to a remote backend

If your `PM_REGISTRIES` is an `http://` URL, you also need to authenticate before running jobs:

```bash
playmolecule login --email you@example.com
```

This stores a cookie under `~/.cache/playmolecule/cookies/` and is reused on every subsequent call. See [Log in to the HTTP backend](howto/log-in-to-the-http-backend.md).

## 5. Verify

```python
from playmolecule import describe_apps

describe_apps()
```

You should see a list of available apps and a one-line description for each. If the list is empty, the registry is either unreachable or empty — check that `PM_REGISTRIES` points where you expect and that you can `docker pull` an image from it manually.

## Licensing

The `playmolecule` Python package on PyPI is free to install. The **apps** it runs — ProteinPrepare, DeepSite, Parameterize, and the rest of Acellera's container catalogue — are **commercial software** and require an Acellera licence to execute. Without a valid licence file (or floating-licence server), the containers will refuse to run.

PlayMolecule® is a virtual environment for drug discovery where simulations, AI, and data are integrated to uncover new insights. [Contact Acellera](https://www.acellera.com/contact-us) to book a demo and obtain a quote tailored to your needs.

## Contributors: working from a checkout

If you're developing PlayMolecule itself, [`uv`](https://docs.astral.sh/uv/) is the recommended tool:

```bash
uv sync --group dev      # install dev + test deps
uv sync --group docs     # add doc-build deps
uv run pytest            # run the test suite
```

See the `[dependency-groups]` table in `pyproject.toml` for the full list of optional groups.


# === reference/cli.md ===

# CLI

The `playmolecule` command-line entry point exists for one job: authenticating against an HTTP backend in environments where launching a Python interpreter to call {py:func}`~playmolecule.login` is awkward (CI, container init scripts, shell-only operators).

```text
playmolecule {login,logout}
```

## `playmolecule login`

Authenticate against the HTTP backend configured in `PM_EXECUTOR` (or the HTTP entry in `PM_REGISTRIES`). On success, a session cookie is written under `PM_COOKIE_CACHE_DIR` (default `~/.cache/playmolecule/cookies/`) and used by every subsequent PlayMolecule call from this user.

```text
playmolecule login [--email EMAIL] [--password PASSWORD]
```

| Option       | Source if not provided                                                                                                       |
|--------------|-----------------------------------------------------------------------------------------------------------------------------|
| `--email`    | `PLAYMOLECULE_EMAIL`, then `PM_EMAIL`. Required.                                                                            |
| `--password` | `PLAYMOLECULE_PASSWORD`, then `PM_PASSWORD`, then an interactive prompt (via `getpass`).                                    |

Exits non-zero on failure, printing the underlying error.

### Examples

```bash
# Interactive — prompt for the password
playmolecule login --email you@example.com

# CI-friendly — credentials from environment
export PM_EMAIL=ci@example.com
export PM_PASSWORD='••••••••••'
playmolecule login
```

## `playmolecule logout`

Clears the cached cookie for the current user, both in memory and on disk.

```text
playmolecule logout
```

Always exits zero, even if there was no active session.

## See also

- [Log in to the HTTP backend](../howto/log-in-to-the-http-backend.md)
- [Environment variables](environment-variables.md)
- {py:func}`~playmolecule.login`
- {py:func}`~playmolecule.logout`


# === reference/environment-variables.md ===

# Environment variables

Every user-tunable PlayMolecule setting is a `PM_*` environment variable. They are read once at import time and never re-read after; change them, then re-import (or restart your Python process).

## Registry and executor

`PM_REGISTRIES`
:   *Default:* `docker://europe-southwest1-docker.pkg.dev/repositories-368911`

    Comma-separated registries, in priority order. Each entry must start with `docker://`, `http://`, or `local:`. `local:` is for developers iterating on their own app manifests on disk; most users want `docker://` (the default). Set to `none` to disable defaults entirely.

`PM_EXECUTOR`
:   *Default:* `local`

    Where jobs run. Either `local` or an `http://` URL pointing at a PlayMolecule HTTP backend.

`PM_NO_DEFAULT_REGISTRIES`
:   *Default:* `0`

    When `1`, skip the default Acellera registry. Use this if you only want the registries you set in `PM_REGISTRIES` and nothing else.

## Container runtime

`PM_RUNTIME`
:   *Default:* `docker`

    `docker` or `apptainer` — which container engine to invoke for local execution.

`PM_SIF_CACHE_DIR`
:   *Default:* `~/.cache/playmolecule/apptainer/`

    Where Apptainer SIF files are cached. Move to shared storage for multi-user / multi-node installs (see [Install apps for a cluster](../howto/install-apps-for-a-cluster.md)).

## Job submission

`PM_QUEUE_CONFIG`
:   *Default:* unset

    JSON dict. When set, `ed.run()` with no arguments uses it. Expected keys: `queue`, `cpu_partition`, `gpu_partition`, plus any extra forwarded to {py:meth}`~playmolecule.ExecutableDirectory.run` as SLURM kwargs.

`PM_SYMLINK`
:   *Default:* unset

    When set (any value), input files are **symlinked** into the run directory instead of copied. Faster for large inputs; the run directory becomes non-self-contained (see [Pass input files to an app](../howto/pass-input-files-to-an-app.md)).

`PM_BLOCKING`
:   *Default:* `0`

    When `1`, app calls block until the job reaches `COMPLETED` or `ERROR` (HTTP-backend path).

`PM_JOB_DIR_PREFIX`
:   *Default:* `""`

    Prefix applied to job paths submitted through the HTTP backend.

`PM_WORKING_DIR`
:   *Default:* unset

    If set, used as the base for resolving relative paths.

## HTTP backend

`PM_BACKEND_HEADERS`
:   *Default:* `{}`

    JSON dict of extra HTTP headers attached to every backend request.

`PM_COOKIE_CACHE_DIR`
:   *Default:* `~/.cache/playmolecule/cookies/`

    Where login session cookies are persisted. Override in containerised CI runners where `$HOME` is ephemeral.

`PLAYMOLECULE_EMAIL` / `PM_EMAIL`
:   *Default:* unset

    Default email for `playmolecule login`.

`PLAYMOLECULE_PASSWORD` / `PM_PASSWORD`
:   *Default:* unset

    Default password for `playmolecule login`. Prefer prompting interactively in normal use.

## Logging

`PM_LOG_LEVEL`
:   *Default:* unset

    Sets the `playmolecule` logger to any standard level (`DEBUG`, `INFO`, `WARNING`, `ERROR`).

`PM_QUIET`
:   *Default:* `0`

    When `1`, convenience alias for `PM_LOG_LEVEL=WARNING`.

## See also

- [Installation](../installation.md)
- [Architecture](../explanation/architecture.md)
- [CLI](cli.md)


# === reference/index.md ===

# Reference

Look-up material. The API pages are auto-generated by `sphinx-apidoc` from the source code; the configuration and CLI pages are hand-curated.

## Configuration & CLI

```{toctree}
:maxdepth: 1

environment-variables
cli
```

## Python API

```{toctree}
:maxdepth: 2

playmolecule
playmolecule.apps
```


# === reference/playmolecule.md ===

# `playmolecule`

The top-level `playmolecule` package re-exports the public API. Everything below is also importable from `playmolecule` directly — `from playmolecule import describe_apps`, etc.

## App discovery

```{eval-rst}
.. autofunction:: playmolecule.describe_apps
```

## Job execution

```{eval-rst}
.. autoclass:: playmolecule.ExecutableDirectory
   :members:
   :show-inheritance:
```

```{eval-rst}
.. autoclass:: playmolecule.JobStatus
   :members:
   :show-inheritance:
   :undoc-members:
```

## SLURM helpers

```{eval-rst}
.. autofunction:: playmolecule.slurm_mps
```

## HTTP backend authentication

```{eval-rst}
.. autofunction:: playmolecule.login
```

```{eval-rst}
.. autofunction:: playmolecule.logout
```

## Application management

```{eval-rst}
.. autofunction:: playmolecule.update_apps
```


# === tutorials/01-first-app-run.md ===

# First app run

**You will learn:** how to discover PlayMolecule apps, set up a job for one, run it locally, and find its outputs.

**Prerequisites:**
- [`playmolecule` installed](../installation.md) with a working registry (default GCloud Docker registry, or your own `PM_REGISTRIES`).
- Docker (or Apptainer) available on your machine.

We'll use `proteinprepare`, the app that protonates a protein at a chosen pH, as the running example. The recipe is the same for every app — they're all just Python functions.

## Setup

```python
from playmolecule import describe_apps
from playmolecule.apps import proteinprepare
```

{py:func}`~playmolecule.describe_apps` is the entry point for discovery. `playmolecule.apps` is the namespace where every installed app appears as an importable submodule.

## Step 1 — See what's installed

```python
describe_apps()
```

You'll see lines like:

```text
ProteinPrepare playmolecule.apps.proteinprepare.v1.proteinprepare
    ProteinPrepare despite it's name prepares proteins but also other systems including nucleic acids
DeepSite playmolecule.apps.deepsite.v1.deepsite
    Predict ligand binding pockets in your protein of interest using a neural network-based predictor.
...
```

The first line has the human-readable name followed by the **fully qualified import path**; the indented second line is the one-sentence description from the app's manifest. The qualified path looks like `playmolecule.apps.<appname>.<version>.<function>` — but you don't need to type the version; the latest one is aliased at `playmolecule.apps.<appname>` (see [Select an app version](../howto/select-an-app-version.md)).

If you want the data as a dict instead of printed output:

```python
apps = describe_apps(as_dict=True)
apps["playmolecule.apps.proteinprepare.v1.proteinprepare"]["description"]
```

## Step 2 — Inspect a single app

Every app is a normal Python function with a generated signature and docstring:

```python
help(proteinprepare)
```

`help` shows the parameter list pulled from the app manifest — names, types, defaults, descriptions, expected outputs, and any built-in examples. In IPython, `proteinprepare?` is equivalent.

## Step 3 — Set up the job

Calling the app function **does not run anything**; it sets up an {py:class}`~playmolecule.ExecutableDirectory`:

```python
ed = proteinprepare(outdir="out", pdbid="3ptb")
```

After this call, `./out/` exists on disk with everything the container needs to execute: a `run_<timestamp>_<uuid>/` inputs directory, the rendered run script, and the manifest's input JSON.

`ed` is just a handle on that directory. Nothing has executed yet — you can still tweak files in `./out/` if you want to.

## Step 4 — Run the job

```python
ed.run()
```

This pulls the app's container image if it isn't cached, runs it against the inputs directory, and streams logs to stdout. When `run()` returns, the job is done. (Caveat: against an HTTP backend, `run()` submits asynchronously and returns immediately — see [Log in to the HTTP backend](../howto/log-in-to-the-http-backend.md) and set `PM_BLOCKING=1` if you want the call to wait.)

## Step 5 — Check status and outputs

```python
print(ed.status)
```

You'll see `JobStatus.COMPLETED` (or `JobStatus.ERROR` if something failed). The full list of states is in {py:class}`~playmolecule.JobStatus`.

The outputs are sitting in `./out/`. For `proteinprepare`:

```text
out/
├── output.pdb       # the protonated structure
├── details.csv      # residue-by-residue protonation report
├── pka_plot.png     # pKa plot
└── run_<id>/        # input manifest + logs + run script
```

What lands in `outdir` is defined by the app's manifest — `help(proteinprepare)` lists the `Outputs` section.

## Step 6 — Shorter syntax

You can chain the call and the run in one line:

```python
proteinprepare(outdir="out", pdbid="3ptb").run()
```

This is fine for ad-hoc work. Keep the `ed` binding when you want to query status later, submit to SLURM, or batch several jobs together.

## Recap

- {py:func}`~playmolecule.describe_apps` lists everything available.
- Every app is a Python function that accepts manifest-defined parameters and returns an {py:class}`~playmolecule.ExecutableDirectory`.
- Calling the app **sets up** a job; calling {py:meth}`~playmolecule.ExecutableDirectory.run` **executes** it.
- {py:attr}`ed.status <playmolecule.ExecutableDirectory.status>` tells you whether the run succeeded.

## Next

- [Using app versions and tests](02-using-app-versions-and-tests.md)
- [Pass input files to an app](../howto/pass-input-files-to-an-app.md)
- [What an {py:class}`~playmolecule.ExecutableDirectory` is](../explanation/executable-directory.md)


# === tutorials/02-using-app-versions-and-tests.md ===

# Using app versions and tests

**You will learn:** how to pin to a specific app version and run the test suite that ships with every app.

**Prerequisites:**
- [First app run](01-first-app-run.md) completed.

Every PlayMolecule app is versioned. The bare app name (`proteinprepare`) is always an alias for the latest installed version — convenient for exploring, brittle if you want reproducibility. This tutorial shows the pieces you need to lock down.

## Setup

```python
from playmolecule.apps import proteinprepare
```

## Step 1 — See available versions

The submodule for each version lives at `playmolecule.apps.<appname>.<version>`:

```python
import playmolecule.apps.proteinprepare as pp_module
print([name for name in dir(pp_module) if name.startswith("v")])
```

If two versions are installed you'll see something like `['v1', 'v2']`. The unqualified `proteinprepare(...)` symbol is aliased to whichever version sorts as latest. (See {py:func}`~playmolecule.describe_apps` for the discovery API.)

## Step 2 — Pin to a specific version

To freeze your script against drift, reach into the version submodule and call its function explicitly:

```python
ed = proteinprepare.v1.proteinprepare(outdir="out", pdbid="3ptb")
```

`proteinprepare.v1` is the version submodule; `proteinprepare.v1.proteinprepare` is the dynamic function (same signature as the unqualified `proteinprepare(...)`, just pinned). Different versions can have different parameters — pinning is the only safe way to reproduce a result across upgrades.

## Step 3 — Look at built-in tests

Every app ships a set of integration tests defined in its manifest:

```python
print(proteinprepare.tests)
```

You'll see entries like:

```text
[simple] 'Prepare 3PTB structure from RCSB'
- Arguments:
  pdbid = 3ptb
- Expected outputs:
  output.pdb
  pka_plot.png
  details.csv
```

Each named test exposes a `run()` method.

## Step 4 — Execute a test

```python
proteinprepare.tests.simple.run()
```

This runs the test in a temporary directory, waits for completion, and asserts that the expected output files exist. A `🎉 Test '<name>' succeeded in N seconds! 🎉` line means PlayMolecule and the app are correctly wired together — useful as a smoke test after installation or after {py:func}`~playmolecule.update_apps`.

To run a test on SLURM instead (handy for GPU-bound apps you can't run locally):

```python
proteinprepare.tests.simple.run(
    queue="slurm",
    dir="/shared/scratch/tests/",   # must be visible to every SLURM worker
    partition="normalCPU",
    ncpu=1,
    ngpu=0,
)
```

Pass `dir=` whenever you submit to SLURM. Without it, the test runs in `/tmp/`, which is node-local on most clusters — the worker won't be able to read what your login node wrote.

## Recap

- Each app exposes versioned submodules (`app.v1`, `app.v2`); the bare name aliases the latest.
- Pin versions in production scripts so an `update_apps()` upgrade can't silently change behaviour.
- `app.tests.<name>.run()` exercises the app end-to-end. Use it as a smoke test after install or upgrade.

## Next

- [Running on SLURM](03-running-on-slurm.md)
- [Select an app version](../howto/select-an-app-version.md)
- [Apps and manifests](../explanation/apps-and-manifests.md)


# === tutorials/03-running-on-slurm.md ===

# Running on SLURM

**You will learn:** how to submit a PlayMolecule job to a SLURM cluster, poll its status, and use the resources requested by the app's manifest.

**Prerequisites:**
- [First app run](01-first-app-run.md) completed.
- SSH access to a node that can `sbatch`.
- `outdir` on a path visible to **all** SLURM nodes (a shared filesystem) — local `/tmp/` will not work for a remote worker.

## Setup

```python
from playmolecule import JobStatus
from playmolecule.apps import proteinprepare
```

## Step 1 — Set up the job in a shared directory

```python
ed = proteinprepare(outdir="/shared/scratch/me/proteinprepare-3ptb", pdbid="3ptb")
```

Where you point `outdir` matters: SLURM workers will read the run script and write outputs through that exact path. Anything under `/tmp/`, `~/`, or any node-local path won't be visible to the worker.

## Step 2 — Submit

```python
ed.run(queue="slurm", partition="normalCPU", ncpu=1, ngpu=0)
```

Passing `queue="slurm"` to {py:meth}`~playmolecule.ExecutableDirectory.run` submits through SLURM instead of running the container locally. Every other keyword (`partition`, `ncpu`, `ngpu`, `memory`, `walltime`, `nodelist`, `exclude`, `envvars`, `prerun`, …) is forwarded to the SLURM queue.

The call returns immediately. The SLURM queue object is stored on `ed` and the job ID is in SLURM's normal accounting.

## Step 3 — Poll status

```python
print(ed.status)
```

You'll see one of the four states from {py:class}`~playmolecule.JobStatus`:

- `JobStatus.WAITING_INFO` — submitted but not yet running.
- `JobStatus.RUNNING`
- `JobStatus.COMPLETED`
- `JobStatus.ERROR`

A simple polling loop:

```python
import time

while ed.status not in (JobStatus.COMPLETED, JobStatus.ERROR):
    time.sleep(30)

print("Done:", ed.status)
```

For background — what each state means and how it's detected — see [Job lifecycle](../explanation/job-lifecycle.md).

## Step 4 — Use app-default resources

The manifest declares per-app resources (CPUs, GPUs) that you don't have to repeat. If you don't pass `ncpu` / `ngpu`, the app's defaults are used. So for an app whose manifest sets `ncpu=4, ngpu=1`, this is enough:

```python
proteinprepare(outdir="/shared/scratch/me/run").run(queue="slurm", partition="normalCPU")
```

Override only when you need to deviate from the manifest defaults.

## Step 5 — Preset the queue once

When `PM_QUEUE_CONFIG` is set, `ed.run()` with no arguments picks up the queue, partition, and resources from the environment and submits to SLURM automatically:

```bash
export PM_QUEUE_CONFIG='{"queue": "slurm", "cpu_partition": "normalCPU", "gpu_partition": "normalGPU"}'
```

```python
ed.run()    # picks gpu_partition if the manifest requests GPUs, cpu_partition otherwise
```

Useful in shared CI scripts and admin-managed environments where users shouldn't have to know which partition to use.

## Recap

- Always set `outdir` to a shared-filesystem path before submitting to SLURM.
- `ed.run(queue="slurm", ...)` returns immediately; query `ed.status` to see progress.
- The app manifest provides default `ncpu` / `ngpu`; override only when needed.
- `PM_QUEUE_CONFIG` lets `ed.run()` pick the partition automatically.

## Next

- [Run many jobs on one GPU](../howto/run-many-jobs-on-one-gpu.md)
- [Check job status](../howto/check-job-status.md)
- [Job lifecycle](../explanation/job-lifecycle.md)

## Side note: `ed.slurm(...)`

`ed.slurm(partition=..., ncpu=..., ...)` is a thin alias for `ed.run(queue="slurm", ...)` that pre-dates the unified `run(queue=...)` interface. New code should use `run(queue="slurm")` — it composes with `PM_QUEUE_CONFIG`, parallels the local-run call, and avoids a second method to remember. {py:meth}`~playmolecule.ExecutableDirectory.slurm` will continue to work.


# === tutorials/index.md ===

# Tutorials

Hands-on, ordered lessons. Read them in sequence if you're new to PlayMolecule.

```{toctree}
:maxdepth: 1

01-first-app-run
02-using-app-versions-and-tests
03-running-on-slurm
```


# === reference/playmolecule.apps.rst ===

playmolecule.apps package
=========================

Module contents
---------------

.. automodule:: playmolecule.apps
   :members:
   :show-inheritance:
   :undoc-members:

