Metadata-Version: 2.4
Name: spatialcheckpoint
Version: 0.1.2
Summary: Spatial heterogeneity profiling of immune checkpoints in spatial transcriptomics
License-Expression: MIT
Keywords: spatial transcriptomics,immune checkpoint,bioinformatics,single-cell,machine learning
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: scanpy>=1.10
Requires-Dist: squidpy>=1.4
Requires-Dist: anndata>=0.10
Requires-Dist: pandas>=2.0
Requires-Dist: numpy>=1.24
Requires-Dist: scikit-learn>=1.4
Requires-Dist: lightgbm>=4.0
Requires-Dist: xgboost>=2.0
Requires-Dist: shap>=0.45
Requires-Dist: lifelines>=0.28
Requires-Dist: matplotlib>=3.8
Requires-Dist: seaborn>=0.13
Requires-Dist: pyyaml>=6.0
Requires-Dist: typer>=0.9
Requires-Dist: rich>=13.0
Requires-Dist: tqdm>=4.66
Requires-Dist: optuna>=3.0
Requires-Dist: scipy>=1.11
Requires-Dist: imbalanced-learn>=0.11
Requires-Dist: requests>=2.28
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"

# SpatialCheckpoint

[![PyPI version](https://img.shields.io/pypi/v/spatialcheckpoint.svg)](https://pypi.org/project/spatialcheckpoint/)
[![Python](https://img.shields.io/pypi/pyversions/spatialcheckpoint.svg)](https://pypi.org/project/spatialcheckpoint/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

**Spatial heterogeneity profiling of immune checkpoints in spatial transcriptomics data.**

SpatialCheckpoint is a bioinformatics pipeline that integrates spatial gene expression profiling, consensus clustering, ensemble ML classification, SHAP interpretability, and clinical survival analysis to characterize immune checkpoint heterogeneity across the tumor microenvironment.

---

## Installation

```bash
pip install spatialcheckpoint
```

**Requirements:** Python ≥ 3.10

---

## CLI

```bash
# Run the built-in demo (no data files needed)
spatialcheckpoint demo

# Download a registered dataset
spatialcheckpoint download BRCA_visium_10x

# Download all BRCA datasets
spatialcheckpoint download all --cancer-type BRCA

# Preprocess raw Visium output or H5AD
spatialcheckpoint preprocess path/to/spaceranger/  data/processed/
spatialcheckpoint preprocess sample.h5ad           data/processed/

# Run full spatial analysis on a preprocessed sample
spatialcheckpoint analyze sample01

# Discover archetypes from a feature matrix CSV
spatialcheckpoint discover results/sample01/features.csv --k-min 2 --k-max 8

# Train the archetype classifier
spatialcheckpoint classify features.csv archetype_labels.csv --model-dir models/

# Generate publication figures
spatialcheckpoint figures --results-dir results/ --output-dir paper/figures/
```

---

## Usage

### Gene Panel

```python
import spatialcheckpoint as scp

# 44 checkpoint genes across 6 functional categories
genes = scp.get_all_checkpoint_genes()

# Genes by category
pd1_pathway  = scp.get_category_genes("co_inhibitory_receptors")
novel        = scp.get_category_genes("novel_checkpoints")
cell_markers = scp.get_immune_cell_markers()    # {cell_type: [genes]}
lr_pairs     = scp.get_ligand_receptor_pairs()  # [{ligand, receptor, alias}]
```

### Data Preprocessing

```python
# From Space Ranger output directory
preprocessor = scp.SpatialDataPreprocessor(spaceranger_out_path="path/to/spaceranger/output")
adata = preprocessor.load_visium()
adata = preprocessor.quality_control(adata, min_genes=200, max_mt_pct=25.0)
adata = preprocessor.normalize(adata)
adata.write_h5ad("data/processed/sample01_preprocessed.h5ad")

# Or from an existing H5AD
preprocessor = scp.SpatialDataPreprocessor(h5_path="existing_data.h5ad")
```

### Spatial Profiling & Feature Extraction

```python
genes = scp.get_all_checkpoint_genes()

# Region-based expression (tumor_core, invasive_margin, stroma, …)
profiler    = scp.SpatialCheckpointProfiler(adata, genes)
region_expr = profiler.expression_by_region()
hotspots    = profiler.checkpoint_hotspot_detection()   # Moran's I per gene

# 80+ spatial features per slide
engineer = scp.SpatialFeatureEngineer(adata, genes)
features = engineer.extract_all_features(sample_id="sample01")
```

### Archetype Discovery

```python
# feature_matrix: DataFrame (n_samples × n_features)
# sample_metadata: DataFrame with 'cancer_type' column, same index
discovery = scp.SpatialArchetypeDiscovery(feature_matrix, sample_metadata)

cc     = discovery.consensus_clustering(k_range=(2, 8), n_iterations=100)
labels = cc["labels"]
char   = discovery.characterize_archetypes(labels)

nmf = discovery.run_nmf(k=cc["optimal_k"])
# nmf["W"]  →  (n_samples, k) soft membership weights
# nmf["H"]  →  (k, n_features) archetype profiles
```

### Classifier Training & SHAP

```python
trainer = scp.ArchetypeModelTrainer(
    feature_matrix=feature_matrix,
    archetype_labels=labels,
    output_dir="models/",
)
results = trainer.run(n_optuna_trials=30)

explainer = scp.ArchetypeExplainer(results["model"], feature_matrix)
shap_df   = explainer.global_feature_importance()
```

---

## Demo

Runs entirely on synthetic data — no Visium files required.

```python
import numpy as np
import pandas as pd
import scanpy as sc
import spatialcheckpoint as scp

print(f"SpatialCheckpoint v{scp.__version__}")

# ── Gene panel ───────────────────────────────────────────────────────────────
genes = scp.get_all_checkpoint_genes()
print(f"Checkpoint panel : {len(genes)} genes")
print(f"PD-1 pathway     : {scp.get_category_genes('co_inhibitory_receptors')}")

lr = scp.get_ligand_receptor_pairs()
print(f"LR pairs ({len(lr)}) e.g. {lr[0]}")

# ── Synthetic Visium slide ───────────────────────────────────────────────────
rng = np.random.default_rng(42)
cp8 = genes[:8]
dummy_genes = [f"GENE{i:04d}" for i in range(92)] + cp8

X = rng.negative_binomial(2, 0.5, size=(200, 100)).astype(float)
adata = sc.AnnData(X=X)
adata.var_names = pd.Index(dummy_genes)

gx, gy = np.meshgrid(np.arange(20), np.arange(10))
coords = np.column_stack([gx.ravel(), gy.ravel()]).astype(float)
coords += rng.uniform(-0.1, 0.1, size=coords.shape)
adata.obsm["spatial"] = coords

region_map = []
for x, y in coords:
    if   x < 5  and y < 5:  region_map.append("tumor_core")
    elif x < 10 and y < 8:  region_map.append("invasive_margin")
    elif x >= 15:            region_map.append("immune_enriched")
    elif y >= 8:             region_map.append("necrotic")
    else:                    region_map.append("stroma")

adata.obs["region_type"] = pd.Categorical(
    region_map,
    categories=["tumor_core","invasive_margin","stroma","immune_enriched","necrotic"]
)

# ── Spatial feature extraction ───────────────────────────────────────────────
engineer = scp.SpatialFeatureEngineer(adata, cp8)
features = engineer.extract_all_features(sample_id="demo")
print(f"\nFeature matrix   : {features.shape[1]} features extracted")

# ── Multi-sample archetype discovery ────────────────────────────────────────
feat_mat = pd.DataFrame(
    rng.standard_normal((30, features.shape[1])),
    index=[f"sample_{i:03d}" for i in range(30)],
    columns=features.columns,
)
meta = pd.DataFrame(
    {"cancer_type": rng.choice(["BRCA","CRC","NSCLC"], 30)},
    index=feat_mat.index,
)

discovery = scp.SpatialArchetypeDiscovery(feat_mat, meta)
cc = discovery.consensus_clustering(k_range=(2, 4), n_iterations=20)
print(f"\nOptimal k        : {cc['optimal_k']}")

char = discovery.characterize_archetypes(cc["labels"])
print("\nArchetype summary:")
print(char[["archetype_name","n_samples"]].to_string())

# ── NMF soft membership ──────────────────────────────────────────────────────
nmf = discovery.run_nmf(k=cc["optimal_k"])
print(f"\nNMF explained variance : {nmf['explained_variance']:.3f}")
print("Membership weights (first 3 samples):")
print(nmf["W"].head(3).round(3).to_string())
```

---

## Gene Panel

| Category | Genes (examples) |
|----------|-----------------|
| Co-inhibitory receptors | `PDCD1` (PD-1), `CTLA4`, `LAG3`, `HAVCR2` (TIM-3), `TIGIT` |
| Co-inhibitory ligands | `CD274` (PD-L1), `PDCD1LG2` (PD-L2), `LGALS9` |
| Novel checkpoints | `VSIR` (VISTA), `CD276` (B7-H3), `VTCN1` (B7-H4) |
| Innate checkpoints | `CD47`, `SIRPA`, `LILRB1`, `LILRB2` |
| Immune enzymes | `IDO1`, `ENTPD1` (CD39), `NT5E` (CD73), `ARG1` |
| Co-stimulatory reference | `CD28`, `ICOS`, `TNFRSF4` (OX40), `TNFRSF9` (4-1BB) |

---

## Archetypes

| Archetype | Spatial signature |
|-----------|------------------|
| `Checkpoint-Hot` | High checkpoint + high immune + co-localized |
| `Checkpoint-Cold` | Low checkpoint + low immune infiltration |
| `Checkpoint-Excluded` | Checkpoint at margin, immune at periphery |
| `Checkpoint-Mismatch` | Checkpoint and immune spatially separated |
| `Innate-Dominant` | CD47/SIRPα axis dominant |
| `Novel-Enriched` | VISTA / B7-H3 / B7-H4 enriched |

---

## Pipeline Architecture

```
Raw Visium data (Space Ranger dir or H5AD)
  → SpatialDataPreprocessor      QC, normalize → 'counts' / 'log1p' layers
  → SpatialCheckpointProfiler    region expression (tumor_core, invasive_margin,
                                  stroma, immune_enriched, necrotic)
  → SpatialFeatureEngineer       80+ features: co-localization, gradients,
                                  Moran's I, region expression ratios
  → SpatialArchetypeDiscovery    consensus KMeans + delta-area k-selection + NMF
  → ArchetypeModelTrainer        LightGBM + XGBoost + MLP + RF ensemble,
                                  SMOTE, RFECV, Optuna HPO
  → ArchetypeExplainer           SHAP global / per-class feature importance
  → ClinicalAssociationAnalyzer  KM curves, Cox PH, logistic regression (OS/PFS)
```

---

## Output Files

| Path | Contents |
|------|----------|
| `results/{sample_id}/features.csv` | 80+ spatial features |
| `results/{sample_id}/region_expression.csv` | Region × gene expression |
| `results/{sample_id}/hotspots.csv` | Moran's I per gene |
| `results/{sample_id}/colocalization.csv` | Ligand-receptor co-occurrence |
| `results/archetypes/archetype_labels.csv` | Sample → archetype assignment |
| `results/archetypes/nmf_W.csv`, `nmf_H.csv` | NMF basis / coefficient matrices |
| `models/archetype_classifier.joblib` | Serialized ensemble model |
| `paper/figures/` | Publication-ready PDF/PNG plots |

---

## Citation

```bibtex
@article{spatialcheckpoint2025,
  title   = {SpatialCheckpoint: Spatial heterogeneity profiling of immune checkpoints
             in spatial transcriptomics},
  author  = {},
  journal = {},
  year    = {2025},
}
```

---

## License

MIT
