Metadata-Version: 2.4
Name: modelstudio-sdk
Version: 0.0.0.dev0
Summary: Python SDK for the Model Studio REST API
License-Expression: MIT
License-File: LICENSE
Requires-Python: >=3.10
Requires-Dist: httpx<1.0,>=0.25.0
Requires-Dist: pydantic<3.0,>=2.0
Provides-Extra: dev
Requires-Dist: mypy>=1.8; extra == 'dev'
Requires-Dist: pandas>=1.5.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest-httpx>=0.30.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.4.0; extra == 'dev'
Provides-Extra: pandas
Requires-Dist: pandas>=1.5.0; extra == 'pandas'
Description-Content-Type: text/markdown

# Model Studio Python SDK

Python SDK for the Model Studio REST API. Provides typed access to dataset management, annotation tooling, metrics, and ML workflow operations.

## Installation

```bash
# From wheel (in notebook containers, pre-installed)
pip install modelstudio-sdk

# Development install
git clone https://gitlab.com/orbitalinsight/elements/model-studio/modelstudio-sdk.git
cd modelstudio-sdk
pip install -e ".[dev]"

# With pandas support
pip install "modelstudio-sdk[pandas]"
```

## Quick Start

```python
from modelstudio import ModelStudioClient

# Auto-configured inside notebooks (reads env vars)
client = ModelStudioClient.from_env()

# Or explicit
client = ModelStudioClient(
    base_url="http://localhost:8081",
    jwt_token="eyJhbG...",
)

# List datasets
for ds in client.datasets.list():
    print(f"{ds.name} ({ds.dataset_type})")
```

## Environment Variables

| Variable | Required | Description |
|----------|----------|-------------|
| `MODEL_STUDIO_API_URL` | Yes | API base URL |
| `MODEL_STUDIO_JWT` | No | JWT authentication token |
| `MODEL_STUDIO_ORG` | No | Organization name |

## Usage Examples

### Dataset Operations

```python
ds = client.dataset("dataset-uuid")

# Get overview statistics
overview = ds.overview()
print(f"Images: {overview.summary.total_images}")
print(f"Annotations: {overview.summary.total_annotations}")

# List splits
for split in ds.splits():
    print(f"{split.name} ({split.split_type})")

# List categories
cats = ds.categories()
for cat in cats.categories:
    print(f"{cat.name}: {cat.annotation_count} annotations")
```

### Split Operations

```python
# Create algorithmic splits
ds.create_algorithmic_splits(
    splits={"train": 0.7, "val": 0.15, "test": 0.15},
    seed=42,
)

# Smart redistribution
result = ds.smart_redistribute(
    ratios={"train": 0.8, "val": 0.2},
    prevent_tile_leakage=True,
)

# Check for data leakage
leakage = ds.check_leakage()
if leakage.has_leakage:
    print(f"Found {leakage.leakage_count} leaked images")
```

### Category Management

```python
# Merge categories
ds.merge_categories(
    source_categories=[1, 2, 3],
    target_category="vehicle",
)

# Rename a category
ds.rename_category(category_id=5, new_name="truck")

# Remove a category
ds.remove_category(category_id=10)

# Consolidate labels
ds.consolidate_labels({"car": "vehicle", "van": "vehicle"})
```

### Working with Split Data

```python
split = ds.split("split-uuid")

# List images
images = split.list_images()

# List annotations
annotations = split.list_annotations()

# Import from S3
queued = split.import_from_source("s3", {
    "connection_id": "conn-uuid",
    "bucket": "my-bucket",
    "prefix": "datasets/coco/",
})

# Wait for import to complete
poller = split.import_poller(interval=5.0)
result = poller.wait(callback=lambda r: print(f"Progress: {r.get('progress')}%"))
```

### Cloning & Async Operations

```python
# Clone a dataset
cloned = ds.clone(name="My Clone")

# Poll until complete
poller = ds.clone_poller(interval=2.0, max_wait=300.0)
result = poller.wait()
print(f"Clone status: {result['clone_status']}")
```

### Validation & Quality

```python
# Validate dataset
result = ds.validate()
print(f"Valid: {result.valid}")

# Check for duplicates
dupes = ds.check_duplicates()
if dupes.has_duplicates:
    print(f"{dupes.total_duplicate_images} duplicate images found")

# Detect temporal conflicts
conflicts = ds.detect_temporal_conflicts()
```

### Export

```python
# Export as COCO JSON
coco = ds.export_coco()
print(f"Exported {coco.image_count} images, {coco.annotation_count} annotations")

# Per-split export
split_coco = split.export_coco()
```

### Filtering

```python
from modelstudio.models.filters import DatasetFilterRequest, CategoryFilter

# Filter to specific categories
result = ds.filter(DatasetFilterRequest(
    category_filter=CategoryFilter(keep_categories=["car", "truck"]),
    new_dataset_name="filtered-cars",
))
print(f"New dataset: {result.new_dataset_id}")
```

### Few-Shot & Oversampling

```python
from modelstudio.models.few_shot import FewShotRequest
from modelstudio.models.oversample import OversampleRequest

# Create few-shot dataset
result = ds.few_shot_create(FewShotRequest(
    num_images=100,
    method="MOST_CLASSES",
    new_dataset_name="few-shot-100",
))

# Oversample minority classes
result = ds.oversample_execute(OversampleRequest(
    target_ratio=0.5,
    strategy="PREFER_ANNOTATED",
))
```

### Dataset Merge

```python
from modelstudio.models.merge import MergeDatasetRequest

# Analyze conflicts before merging
analysis = client.datasets.merge_analyze(["ds-1", "ds-2"])

# Merge datasets
result = client.datasets.merge(MergeDatasetRequest(
    source_dataset_ids=["ds-1", "ds-2"],
    target_name="merged-dataset",
))
```

### Undo/Redo

```python
# View history
history = ds.history()
for entry in history.changes:
    print(f"{entry.operation_type}: {entry.short_description}")

# Undo
result = ds.undo()
print(f"Undid: {result.description}")
```

### DataFrame Integration

```python
# Requires: pip install "modelstudio-sdk[pandas]"

# Images as DataFrame
df = split.images_df()

# Annotations as DataFrame
df = split.annotations_df()

# Categories as DataFrame
df = ds.categories_df()

# Class distribution as DataFrame
df = ds.class_distribution_df()
```

## Error Handling

```python
from modelstudio.exceptions import NotFoundError, ConflictError, BadRequestError

try:
    ds = client.dataset("nonexistent")
    ds.overview()
except NotFoundError as e:
    print(f"Dataset not found: {e.message}")
except ConflictError as e:
    print(f"Operation conflict: {e.message}")
except BadRequestError as e:
    print(f"Invalid request: {e.message}")
```

## Architecture

### Related Repositories

| Repo | Purpose |
|------|---------|
| [`model-studio-sdk`](https://gitlab.com/orbitalinsight/elements/model-studio/modelstudio-sdk) | This repo — Python SDK + Jupyter Server Docker |
| [`frontend`](https://gitlab.com/orbitalinsight/frontend-2.0) | Model Studio React frontend (custom notebook UI lives here) |
| [`model-studio-api`](https://gitlab.com/orbitalinsight/elements/model-studio/model-studio-api) | Backend REST API the SDK wraps |
| [`model-studio-notebooks`](https://gitlab.com/orbitalinsight/elements/model-studio/model-studio-notebooks) | JupyterHub + KubeSpawner Helm chart (production multi-user) |
| [`model-studio-agent`](https://gitlab.com/orbitalinsight/elements/model-studio/model-studio-agent) | Agent chat backend |
| [`keycloak-config`](https://gitlab.com/orbitalinsight/elements/keycloak-config) | Keycloak realm/client configuration |

### Local Development Architecture

```
┌─────────────────────────────────────────────────────────────┐
│  Browser (http://localhost:5173)                            │
│                                                             │
│  ┌───────────────────────────────────────────────────────┐  │
│  │  Model Studio Frontend (Vite)                         │  │
│  │  ┌──────────────┐  ┌──────────────┐  ┌────────────┐  │  │
│  │  │ Dataset Pages │  │ Notebook     │  │ Agent Chat │  │  │
│  │  │              │  │ Panel        │  │ Panel      │  │  │
│  │  └──────────────┘  └──────┬───────┘  └─────┬──────┘  │  │
│  └───────────────────────────┼────────────────┼──────────┘  │
│                              │                │             │
│  Vite Dev Server Proxies:    │                │             │
│  /jupyter/* ─────────────────┘                │             │
│  /agent/* ────────────────────────────────────┘             │
└──────────────────────────────┼────────────────┼─────────────┘
                               │                │
              ┌────────────────┘                │
              ▼                                 ▼
┌──────────────────────────┐    ┌──────────────────────────┐
│  Jupyter Server (Docker) │    │  Agent API               │
│  localhost:8889          │    │  localhost:8080           │
│                          │    │  (model-studio-agent)     │
│  ┌────────────────────┐  │    └──────────────────────────┘
│  │ Python 3.10 Kernel │  │
│  │ + Model Studio SDK │  │              ▲
│  └────────┬───────────┘  │              │
└───────────┼──────────────┘              │
            │                             │
            ▼                             │
┌──────────────────────────────────────────────────────────┐
│  Model Studio API                                        │
│  https://model-studio-api.model-studio.privateer-dev.com │
│  (model-studio-api repo)                                 │
└──────────────────────────────────────────────────────────┘
```

**Data flow**: User writes Python in the Notebook Panel → CodeMirror editor sends code via WebSocket to Jupyter kernel → kernel executes using the SDK → SDK calls Model Studio API → results render in the panel.

### Production Architecture

```
┌──────────────────────────────────────────────────────────┐
│  Browser                                                 │
│  Model Studio Frontend (static build on CDN/Nginx)       │
│  ┌──────────────┐  ┌──────────────┐  ┌────────────────┐ │
│  │ Dataset Pages │  │ Notebook     │  │ Agent Chat     │ │
│  │              │  │ Panel        │  │ Panel          │ │
│  └──────────────┘  └──────┬───────┘  └────────────────┘ │
└────────────────────────────┼─────────────────────────────┘
                             │
                             ▼
┌──────────────────────────────────────────────────────────┐
│  JupyterHub (model-studio-notebooks repo)                │
│  - Keycloak OIDC auth                                    │
│  - KubeSpawner → per-user Jupyter Server pods            │
│  - Helm chart for K8s deployment                         │
│                                                          │
│  ┌──────────────────────────────────────────────────┐    │
│  │  Per-User Jupyter Server (K8s Pod)               │    │
│  │  ┌────────────────────┐                          │    │
│  │  │ Python 3.10 Kernel │                          │    │
│  │  │ + Model Studio SDK │                          │    │
│  │  └────────┬───────────┘                          │    │
│  └───────────┼──────────────────────────────────────┘    │
└──────────────┼───────────────────────────────────────────┘
               │
               ▼
┌──────────────────────────────────────────────────────────┐
│  Model Studio API (K8s service)                          │
└──────────────────────────────────────────────────────────┘
```

**Key difference**: In production, JupyterHub (from `model-studio-notebooks` repo) manages multi-user server lifecycle, auth, and resource limits. The custom notebook UI replaces JupyterLab's frontend but JupyterHub still manages server spawning.

---

## Development

### Prerequisites

- [Miniconda](https://docs.conda.io/en/latest/miniconda.html) or Anaconda
- Docker + Docker Compose (for notebook server)
- `jq` and `curl` (for Keycloak token fetch)

### Quick Start — SDK Development

```bash
./develop.sh --mode setup              # Create conda env, install deps
./develop.sh --mode test               # Run unit tests
./develop.sh --mode test-integration   # Fetch Keycloak token + run integration tests
./develop.sh --mode lint               # Run ruff + mypy
```

This creates a `model-studio-sdk` conda environment with Python 3.10 and installs the SDK in editable mode with all dev dependencies.

### Quick Start — Custom Notebook UI

The notebook UI spans two repos: the Jupyter Server backend (this repo) and the React frontend (`frontend` repo).

**Terminal 1 — Start Jupyter Server:**

```bash
# From this repo (model-studio-sdk)
./develop.sh --mode notebook-server
```

This prompts for Keycloak credentials, starts a headless Jupyter Server on port 8889 with the SDK pre-installed. The `src/` and `notebooks/` directories are volume-mounted for live reloading.

**Terminal 2 — Start frontend:**

```bash
# From the frontend repo
cd ../frontend
source .go-privateer-dev.env
yarn build-consts

# First time only — install CodeMirror dependencies:
yarn add @codemirror/view @codemirror/state @codemirror/commands @codemirror/lang-python @codemirror/theme-one-dark

yarn start
```

**Verify the setup:**

1. Open http://localhost:5173
2. Click the **Notebook** button in the AppBar (next to Agent)
3. Type `print('hello')` in the cell and press **Shift+Enter**
4. Output should appear below the cell

**How the proxy works**: The frontend's `vite.config.ts` proxies `/jupyter/*` requests to `localhost:8889` (the Docker Jupyter Server). This includes both REST API calls and WebSocket connections for kernel communication. No environment variables are needed — the proxy is configured in code.

### JupyterLab Mode (Full Lab UI)

If you need the traditional JupyterLab interface (e.g., for notebook authoring):

```bash
./develop.sh --mode docker     # Starts full JupyterLab on port 8888
# or
make docker
```

Open http://localhost:8888 for the JupyterLab UI. Notebooks are in the `notebooks/` directory.

### Integration Tests

Integration tests run against the live dev API and require a Keycloak JWT:

```bash
./develop.sh --mode test-integration
```

This will prompt for your Keycloak credentials (same as your Model Studio login), fetch a JWT, and run the integration test suite.

To skip the auth prompt (e.g. if you already have a token):

```bash
export MODEL_STUDIO_JWT="eyJhbG..."
./develop.sh --mode test-integration --skip-auth
```

Or set credentials as env vars to skip the interactive prompts:

```bash
export KEYCLOAK_USERNAME="you"
export KEYCLOAK_PASSWORD="secret"
./develop.sh --mode test-integration
```

### Make Targets

If you prefer to manage your own environment, the Makefile targets still work:

```bash
make install              # pip install -e ".[dev]"
make test                 # Unit tests with coverage
make test-unit            # Unit tests only (no integration)
make lint                 # ruff + mypy
make build                # Build wheel
make notebook-server      # Start headless Jupyter Server (port 8889)
make notebook-server-down # Stop Jupyter Server
make docker               # Start full JupyterLab (port 8888)
make docker-down          # Stop JupyterLab
```

For integration tests without `develop.sh`:

```bash
eval "$(scripts/get-token.sh)" && make test-integration
```

### Docker Services

The `docker/docker-compose.yml` defines two services:

| Service | Port | Purpose |
|---------|------|---------|
| `notebook` | 8888 | Full JupyterLab with Lab UI (for notebook authoring) |
| `jupyter-server` | 8889 | Headless Jupyter Server (for custom notebook UI backend) |

Both use the same `Dockerfile.dev` base image (`jupyter/scipy-notebook:python-3.10`) with the SDK installed in editable mode. The `src/` directory is volume-mounted so SDK changes are picked up without rebuilding.

The `jupyter-server` service additionally configures:
- CORS headers for `http://localhost:5173` (Vite dev server)
- Disabled XSRF checks (local dev only — production uses JupyterHub auth)
- No authentication token (local dev only)

## Requirements

- Python >= 3.10
- httpx >= 0.25.0
- pydantic >= 2.0
- pandas >= 1.5.0 (optional)
