Metadata-Version: 2.4
Name: omnibioai-model-registry
Version: 0.1.0
Summary: Production-grade model registry for the OmniBioAI ecosystem.
Author: Manish Kumar
License: MIT
Keywords: model-registry,mlops,bioinformatics,ai,machine-learning,artifact-management,reproducibility
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: black>=23.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0; extra == "dev"

# OmniBioAI Model Registry

**OmniBioAI Model Registry** is a **production-grade lifecycle management system for AI/ML models** within the **OmniBioAI ecosystem**.

It provides:

- **Immutable model versioning** (write-once versions)
- **Cryptographic integrity verification (SHA256)**
- **Provenance-friendly metadata capture**
- **Staged promotion workflows (latest → staging → production)**
- **Deterministic resolution by stable reference** (`model@alias` or `model@version`)
- **Local-first design** with a clean path to future backends (S3/Azure/on-prem)

The registry is implemented as a **standalone Python library** and includes:
- a **CLI** (`omr`)
- a **minimal REST service** (FastAPI)

---

## Why This Exists

Biomedical AI requires:

- **Reproducibility**
- **Auditability**
- **Governance**
- **Offline / air-gapped deployment**
- **Cross-infrastructure execution parity**

Traditional ML tooling often assumes:
- cloud-first infrastructure
- mutable artifacts
- weak provenance guarantees

**OmniBioAI Model Registry is designed differently.**

> It treats AI models as **scientific artifacts** that must be **immutable, verifiable, and reproducible** across environments.

---

## Role in the OmniBioAI Architecture

OmniBioAI follows a **four-plane architecture**:

| Plane             | Responsibility                         |
| ----------------- | -------------------------------------- |
| **Control Plane** | UI, registries, metadata, governance   |
| **Compute Plane** | Workflow execution, HPC/cloud adapters |
| **Data Plane**    | Artifacts, datasets, outputs           |
| **AI Plane**      | Reasoning, RAG, agents, interpretation |

The **Model Registry** belongs to the **Control Plane** and provides:

- AI artifact governance
- deterministic inference references
- promotion and audit workflows
- infrastructure-independent model resolution

---

## Core Design Principles

### 1) Immutability
Each model version is **write-once**:
- no overwrites
- no silent mutation
- full historical trace

This guarantees **scientific reproducibility**.

### 2) Integrity Verification
Every model package includes a SHA256 manifest:

- `sha256sums.txt` hashes the package contents (excluding itself)

This enables:
- bit-level reproducibility
- tamper detection
- trustworthy deployment in regulated environments

### 3) Provenance-Friendly Metadata
Each model stores structured metadata via `model_meta.json`, such as:
- training code version (git commit)
- dataset reference (e.g., DVC / object store ref)
- hyperparameters and preprocessing
- creator and timestamp

### 4) Promotion Workflow
Models move through controlled stages:

```

latest → staging → production

```

All promotions are:
- explicit
- append-only
- audited (`audit/promotions.jsonl`)

### 5) Storage Abstraction
v0.1.0 supports:
- **local filesystem backend** (`localfs`)

Planned:
- S3 / Azure Blob / enterprise on-prem backends

---

## Repository Structure

```

omnibioai-model-registry/
├── omnibioai_model_registry/
│   ├── api.py
│   ├── config.py
│   ├── refs.py
│   ├── storage/
│   ├── package/
│   ├── audit/
│   ├── cli/
│   └── service/
├── tests/
├── pyproject.toml
└── README.md

```

---

## Canonical Model Package Layout

Registered models follow a strict, portable structure:

```

<OMNIBIOAI_MODEL_REGISTRY_ROOT>/
tasks/<task>/models/<model_name>/
versions/<version>/
model.pt
model_genes.txt
label_map.json
model_meta.json
metrics.json
feature_schema.json
sha256sums.txt
aliases/
latest.json
staging.json
production.json
audit/
promotions.jsonl

````

This guarantees:
- deterministic loading
- integrity validation
- cross-environment portability

---

## Install, Build, and Use as a Python Package

### 1) Configure registry root
The registry requires a root directory:

```bash
export OMNIBIOAI_MODEL_REGISTRY_ROOT=~/Desktop/machine/local_registry/model_registry
````

### 2) Install (editable) for development

From this repository root:

```bash
pip install -e .
```

Verify:

```bash
python -c "import omnibioai_model_registry as m; print('OK', m.__file__)"
omr --help
```

### 3) Build a wheel (distribution)

Install build tooling:

```bash
pip install build
```

Build:

```bash
python -m build
```

Artifacts are written to `dist/`:

* `dist/omnibioai_model_registry-0.1.0-py3-none-any.whl`
* `dist/omnibioai_model_registry-0.1.0.tar.gz`

Install the wheel:

```bash
pip install dist/*.whl
```

---

## CLI Usage (`omr`)

### Register a model package

```bash
omr register \
  --task celltype_sc \
  --model human_pbmc \
  --version 2026-02-14_001 \
  --artifacts /tmp/model_pkg \
  --set-alias latest
```

### Resolve a model reference

```bash
omr resolve --task celltype_sc --ref human_pbmc@latest
```

### Promote a version to production

```bash
omr promote --task celltype_sc --model human_pbmc --version 2026-02-14_001 --alias production
```

### Verify integrity

```bash
omr verify --task celltype_sc --ref human_pbmc@production
```

### Show metadata

```bash
omr show --task celltype_sc --ref human_pbmc@production --json
```

---

## Python API Usage

```python
from omnibioai_model_registry import register_model, resolve_model, promote_model

register_model(
    task="celltype_sc",
    model_name="human_pbmc",
    version="2026-02-14_001",
    artifacts_dir="/tmp/model_pkg",
    metadata={
        "framework": "pytorch",
        "model_type": "classifier",
        "provenance": {
            "git_commit": "abc123",
            "training_data_ref": "s3://bucket/datasets/pbmc_v1",
            "trainer_version": "0.1.0",
        },
    },
    set_alias="latest",
    actor="manish",
    reason="initial training",
)

# Resolve by alias (or version)
path = resolve_model("celltype_sc", "human_pbmc@latest", verify=True)
print("Resolved model dir:", path)

# Promote to production
promote_model(
    task="celltype_sc",
    model_name="human_pbmc",
    alias="production",
    version="2026-02-14_001",
    actor="manish",
    reason="validated metrics",
)
```

---

## Minimal REST Service (FastAPI)

### Run locally

```bash
pip install -r omnibioai_model_registry/service/requirements.txt
uvicorn omnibioai_model_registry.service.app.main:app --host 0.0.0.0 --port 8095
```

Test:

```bash
curl -s http://127.0.0.1:8095/health | python -m json.tool
```

Endpoints:

* `POST /v1/register`
* `GET  /v1/resolve`
* `POST /v1/promote`
* `POST /v1/verify`
* `GET  /v1/show`

---

## Testing

```bash
pip install -e ".[dev]"
pytest -q
```

---

## Relationship to OmniBioAI Ecosystem

This registry is a **control-plane component** of OmniBioAI.

Companion repositories:

* **omnibioai** → AI-powered bioinformatics workbench
* **omnibioai-tes** → execution orchestration across local/HPC/cloud
* **omnibioai-rag** → reasoning and literature intelligence
* **omnibioai-lims** → laboratory data management
* **omnibioai-workflow-bundles** → reproducible pipelines
* **omnibioai-sdk** → Python client access

The **Model Registry** provides the **AI artifact governance layer** shared by all.

---

## Roadmap

### Near Term

* additional storage backends (S3 / Azure)
* expanded metadata validation + schemas
* model listing and metadata search APIs

### Mid Term

* RBAC and governance controls
* richer registry service APIs (auth, pagination, filtering)
* comparison and promotion policies

### Long Term

* enterprise biomedical AI governance platform
* regulatory-ready audit and lineage
* deeper integration with experiment tracking and clinical pipelines

---

## Status

* ✅ Immutable and verifiable model storage
* ✅ Audit-ready promotion workflow
* ✅ CLI + minimal REST service
* ✅ Local-first, cloud-ready design

**OmniBioAI Model Registry establishes the foundation for trustworthy, reproducible biomedical AI deployment.**

