Metadata-Version: 2.4
Name: dsimaging-admin
Version: 0.3.2
Summary: Admin CLI for managing medical imaging datasets in S3/MinIO for DataSHIELD
Project-URL: Homepage, https://github.com/isglobal-brge/dsimaging-admin
Project-URL: Repository, https://github.com/isglobal-brge/dsimaging-admin
Author-email: David Sarrat Gonzalez <david.sarrat@isglobal.org>, Juan R Gonzalez <juanr.gonzalez@isglobal.org>
License-Expression: MIT
License-File: LICENSE
Keywords: datashield,dicom,medical-imaging,minio,radiomics,s3
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Healthcare Industry
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Requires-Python: >=3.9
Requires-Dist: boto3>=1.28.0
Requires-Dist: click>=8.0.0
Requires-Dist: pyarrow>=14.0.0
Requires-Dist: pydicom>=3.0.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: requests>=2.31.0
Description-Content-Type: text/markdown

# dsimaging-admin

Admin CLI for creating and operating `dsimaging-store` deployments and the
medical imaging datasets stored in them.

## Install

```bash
pip install dsimaging-admin
```

## Create a store

`dsimaging-admin store init` writes a Docker Compose project for MinIO plus the
`dsimaging-store` controller.

```bash
dsimaging-admin store init ./study-store \
  --controller-image davidsarratgonzalez/dsimaging-store:latest
dsimaging-admin store up ./study-store
dsimaging-admin store doctor ./study-store
```

For local controller development, build from a checked-out `dsimaging-store`
repo instead of using an image:

```bash
dsimaging-admin store init ./study-store \
  --store-source /path/to/dsimaging-store
dsimaging-admin store up ./study-store
```

Use the generated connection details as a reusable CLI profile:

```bash
dsimaging-admin init \
  --endpoint http://127.0.0.1:9000 \
  --controller-url http://127.0.0.1:8080 \
  --bucket imaging-data
```

## Dataset operations

```bash
# Publish with staging, publish lock, skip-if-hash-matches and DICOM checks.
dsimaging-admin publish \
  --dataset-id study_ct_v1 \
  --source /data/study_ct \
  --metadata /data/study_ct/clinical.csv \
  --modality ct

# Inspect and verify.
dsimaging-admin list
dsimaging-admin status study_ct_v1
dsimaging-admin verify study_ct_v1
dsimaging-admin doctor

# Rebuild artifacts from S3, copy, download or delete.
dsimaging-admin rescan study_ct_v1
dsimaging-admin copy study_ct_v1 study_ct_v2 --yes
dsimaging-admin download study_ct_v1 ./debug/study_ct_v1
dsimaging-admin delete study_ct_v1 --yes --purge-versions
```

All reporting commands that are useful for automation support JSON output:

```bash
dsimaging-admin list --output json
dsimaging-admin status study_ct_v1 --output json
dsimaging-admin verify study_ct_v1 --output json
dsimaging-admin doctor --output json
```

## What `publish` does

1. Scans your local image directory and optional masks under `source/masks/`,
   `masks/`, `source/labels/`, or `labels/`.
2. Runs basic DICOM sanity checks for series UID, modality and instance order.
3. Computes SHA-256 content hashes.
4. Skips uploads that already match the current dataset hash indexes.
5. Uploads through `datasets/<id>/.staging-*` and a `.publish-lock`, then copies
   into `datasets/<id>/source/...`.
6. Generates and uploads:
   - `manifest.yaml`
   - `indexes/content_hash_index.parquet`
   - `indexes/masks_content_hash_index.parquet` when masks exist
   - `metadata/sample_manifests.parquet`
   - `metadata/samples.parquet`

Use `--dry-run` to scan and show the upload plan without S3 writes, `--no-skip`
to force uploads, or `--no-atomic` to disable staging.

## Configuration

`~/.dsimaging.yaml` supports multiple profiles:

```yaml
default_profile: default
profiles:
  default:
    endpoint: http://127.0.0.1:9000
    controller_url: http://127.0.0.1:8080
    bucket: imaging-data
    access_key: minioadmin
    secret_key: minioadmin123
    region: ""
```

Environment variables override profile values:

| Variable | Default | Description |
|---|---|---|
| `DSIMAGING_PROFILE` | `default` | Config profile |
| `DSIMAGING_ENDPOINT` | `http://127.0.0.1:9000` | S3/MinIO endpoint |
| `DSIMAGING_CONTROLLER_URL` | (empty) | dsimaging-store controller URL |
| `DSIMAGING_ACCESS_KEY` | `minioadmin` | S3 access key |
| `DSIMAGING_SECRET_KEY` | `minioadmin123` | S3 secret key |
| `DSIMAGING_BUCKET` | `imaging-data` | Bucket name |
| `DSIMAGING_REGION` | (empty) | S3 region |

## Dataset layout in S3

```text
s3://<bucket>/datasets/<dataset_id>/
  manifest.yaml
  metadata/
    samples.parquet
    sample_manifests.parquet
  indexes/
    content_hash_index.parquet
    masks_content_hash_index.parquet
  source/
    images/
    masks/
  derived/
  qc/
```

Store creation and dataset management only target dsimaging-store deployments.
