Metadata-Version: 2.4
Name: br-scratch-keepalive
Version: 0.1.1
Summary: CLI for keeping large BR200 scratch datasets warm with resumable refresh jobs.
Author: Amit Subhash
License: MIT
Keywords: hpc,slurm,scratch,br200,keepalive
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: System :: Systems Administration
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file
Dynamic: requires-python

# br-scratch-keepalive

`br-scratch-keepalive` is a Python `3.9+` package that installs the `scratch-keepalive` CLI for running inside BR200 shell sessions. It manages large datasets under `/N/scratch/$USER/...`, refreshes them on a recurring scheduler, and keeps resumable checkpoint state outside scratch.

This is a best-effort anti-purge tool. It reduces risk for scratch datasets; it does not make scratch archival.

## Shared cluster compliance

This package is intended to stay conservative on BR200:

- the `br200` profile enforces a minimum refresh cadence of `14` days
- it keeps only one future scheduled refresh job in the chain
- recurring runs use a small request on `general`
- it should not be used to evade explicit IU or BR200 storage policy

If IU or BR200 admins tell you not to use this workflow, stop using it.

## What it does

- registers large datasets under your BR200 scratch space
- keeps a `keep-until` policy per dataset
- runs metadata-oriented refreshes, not bytewise rereads
- checkpoints partial refresh progress so the next run resumes instead of starting over
- stores logs, registry state, and checkpoint files outside scratch
- installs a recurring scheduler entry for the current BR200 user

## What it does not do

- it does not make scratch permanent
- it does not archive to Slate, Slate-Project, or SDA
- it does not run from your laptop
- it does not require or use a personal SSH alias
- it does not redownload missing data

## Install

From inside BR200:

```bash
python -m pip install br-scratch-keepalive
```

Or from a cloned repo:

```bash
python -m pip install .
```

## BR200 quickstart

```bash
python -m pip install .
scratch-keepalive init --profile br200
scratch-keepalive add \
  --name mr-rate \
  --path /N/scratch/$USER/datasets/Forithmus/MR-RATE \
  --keep-until 2026-07-31
scratch-keepalive refresh --name mr-rate
scratch-keepalive install-cron
scratch-keepalive status --name mr-rate
```

## Recommended workflow

1. Log into BR200 normally.
2. Install the package into your BR200 Python environment.
3. Run `scratch-keepalive init --profile br200`.
4. Add one or more datasets under `/N/scratch/$USER/...`.
5. Run one manual refresh to verify permissions and state layout.
6. Install the recurring scheduler entry.
7. Use `status` and `doctor` to inspect health.

## Commands

```text
scratch-keepalive init
scratch-keepalive add
scratch-keepalive list
scratch-keepalive status
scratch-keepalive refresh
scratch-keepalive extend
scratch-keepalive enable
scratch-keepalive disable
scratch-keepalive remove
scratch-keepalive install-cron
scratch-keepalive uninstall-cron
scratch-keepalive doctor
scratch-keepalive repair
```

## State layout

The BR200 profile keeps control-plane state outside scratch:

- registry: persistent dataset state
- checkpoints: resumable partial-refresh state
- logs: per-run refresh logs
- sentinel: a small tool-owned file in the dataset root

## Resume semantics

Refreshes are split into deterministic units. If a refresh run fails or times out:

- completed units stay recorded in checkpoint state
- remaining units are retried on the next run
- the checkpoint is deleted only after the full dataset refresh completes

## Publishing

Public package name:

- package: `br-scratch-keepalive`
- CLI command: `scratch-keepalive`

## Notes

- Run this from inside BR200, not from your laptop.
- The tool does not rely on a local SSH alias like `br200`.
- Default recurring cadence is every 14 days.
- The `br200` profile will not allow a cadence below 14 days.
- Default recurring job request is `general`, `1 CPU`, `2G`, `2:00:00`.
- Logs and checkpoint state live outside scratch.
- When `scrontab` is disabled on BR200, `install-cron` falls back to a self-resubmitting Slurm job.
