Metadata-Version: 2.4
Name: concadptr
Version: 0.1.0
Summary: Fuse multiple LoRA adapters into Mixture-of-Experts systems. Train → Compose → Serve.
Author-email: Irfan Ali <irfan.alee@protonmail.com>
License: Apache-2.0
Project-URL: Homepage, https://github.com/irfanalii/concadptr
Project-URL: Documentation, https://github.com/irfanalii/concadptr#readme
Project-URL: Repository, https://github.com/irfanalii/concadptr
Project-URL: Issues, https://github.com/irfanalii/concadptr/issues
Keywords: lora,mixture-of-experts,moe,fine-tuning,peft,llm,transformers,adapter,molora,concadptr
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: torch>=2.0.0
Requires-Dist: transformers>=4.36.0
Requires-Dist: peft>=0.7.0
Requires-Dist: safetensors>=0.4.0
Requires-Dist: datasets>=2.14.0
Requires-Dist: accelerate>=0.25.0
Requires-Dist: tqdm>=4.65.0
Requires-Dist: pyyaml>=6.0
Provides-Extra: training
Requires-Dist: bitsandbytes>=0.41.0; extra == "training"
Requires-Dist: trl>=0.7.0; extra == "training"
Provides-Extra: serving
Requires-Dist: fastapi>=0.104.0; extra == "serving"
Requires-Dist: uvicorn>=0.24.0; extra == "serving"
Provides-Extra: dev
Requires-Dist: pytest>=7.4.0; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: pytest-mock>=3.12.0; extra == "dev"
Requires-Dist: httpx>=0.25.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.7.0; extra == "dev"
Provides-Extra: hub
Requires-Dist: huggingface_hub>=0.19.0; extra == "hub"
Provides-Extra: all
Requires-Dist: concadptr[dev,hub,serving,training]; extra == "all"

# 🧪 ConcAdptr

**Concocting Adapters — Brew multiple LoRA adapters into Mixture-of-Experts systems.**

**Train → Concoct → Serve.**

[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-green.svg)](https://opensource.org/licenses/Apache-2.0)
[![PyPI version](https://img.shields.io/badge/pypi-v0.1.0-orange.svg)](https://pypi.org/project/concadptr/)

---

ConcAdptr takes independently trained LoRA adapters and concocts them into MoE-style expert systems with learned routing. Model-agnostic. Privacy-preserving. Built for production.

## The Problem

You fine-tune a base model with LoRA for your product. Then each customer/user-group needs their own specialization — but they can't share their data. You end up with multiple LoRA adapters trained in isolation. How do you combine them into something smarter than any individual adapter?

## The Solution

ConcAdptr takes your independently trained LoRA adapters and concocts them into experts within a Mixture-of-Experts system. A lightweight router learns which expert(s) to activate for each input — without needing access to any customer's original training data.

```
Base Model ─┬─ LoRA Adapter A (medical)    ──┐
            ├─ LoRA Adapter B (legal)      ──┼── Router ── Concocted Output
            └─ LoRA Adapter C (finance)    ──┘
```

## Key Features

- **Model-agnostic** — Works with any HuggingFace transformer (Qwen, LLaMA, Mistral, Gemma, etc.)
- **Privacy-preserving** — Customer data never leaves their environment; only adapters travel
- **3 routing strategies** — Soft merging (MoLoRA), Top-K sparse routing (MixLoRA), X-LoRA learned scaling
- **Full pipeline** — Train adapters → Concoct with router → Serve — one library
- **Production-ready** — FastAPI serving, adapter registry, compatibility validation
- **Consumer GPU friendly** — 4-bit quantization, runs on 16GB VRAM

## Installation

```bash
pip install concadptr
```

With optional dependencies:

```bash
pip install concadptr[training]   # + bitsandbytes, trl
pip install concadptr[serving]    # + fastapi, uvicorn
pip install concadptr[all]        # everything
```

## Quick Start

### 1. Define your configuration

```python
from concadptr import ConcAdptrConfig

config = ConcAdptrConfig(
    base_model="Qwen/Qwen2.5-7B-Instruct",
    adapters={
        "medical": "./adapters/medical_invoices",
        "legal": "./adapters/legal_contracts",
        "finance": "./adapters/financial_reports",
    },
    routing_strategy="xlora",  # or "soft_merging", "top_k"
    quantization="4bit",
)
```

Or load from YAML:

```python
config = ConcAdptrConfig.from_yaml("config.yaml")
```

### 2. Build the concocted model

```python
from concadptr import ConcAdptrModel

model = ConcAdptrModel.from_config(config)
print(model)
# ConcAdptrModel(
#   base_model='Qwen/Qwen2.5-7B-Instruct',
#   num_adapters=3,
#   adapter_names=['medical', 'legal', 'finance'],
#   routing=xlora,
#   trainable_params=263,168
# )
```

### 3. Train the router

```python
from concadptr import ConcAdptrTrainer
from datasets import load_dataset, concatenate_datasets

# Router training data: a MIX of domain samples (NOT customer data)
# The router needs to see examples from different domains
# so it learns which adapter helps with which type of input
medical_samples = load_dataset("medical_qa", split="train[:500]")
legal_samples = load_dataset("legal_docs", split="train[:500]")
finance_samples = load_dataset("finance_qa", split="train[:500]")

# Combine into one mixed dataset
router_dataset = concatenate_datasets([medical_samples, legal_samples, finance_samples])
# Use your general-purpose dataset (NOT customer data)
#dataset = load_dataset("your_dataset")
trainer = ConcAdptrTrainer(
    model=model,  # This already has all 3 adapters loaded + the router
    train_dataset=router_dataset,
    eval_dataset=eval_dataset,
    learning_rate=1e-4,
    num_epochs=3,
    batch_size=4,
)

results = trainer.train()
model.save_pretrained("./concocted_model")
```

### 4. Analyze routing patterns

```python
from concadptr.utils import print_routing_summary

model.router.enable_history(True)
# Run some inference...
stats = model.router.get_routing_stats()
print_routing_summary(stats, expert_names=["medical", "legal", "finance"])
```

Output:
```
ConcAdptr Routing Summary
========================================
Routing Entropy: 1.0234 / 1.0986 (max)
Uniformity: 93.2%

Expert Load:
  medical              0.3841 ███████████████████
  legal                0.3012 ███████████████
  finance              0.3147 ███████████████

Expert Utilization (top-2):
  medical              0.8234 █████████████████████████████████████████
  legal                0.6891 ██████████████████████████████████
  finance              0.7102 ███████████████████████████████████
```

### 5. Serve (optional)

```python
from concadptr.serving import serve

serve("./concocted_model", host="0.0.0.0", port=8000)
```

```bash
curl -X POST http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Analyze this medical invoice...", "max_tokens": 256}'
```

## Routing Strategies

| Strategy | How It Works | Best For |
|----------|-------------|----------|
| `soft_merging` | Weighted average of ALL experts per token | Few experts (2-8), overlapping domains |
| `top_k` | Activate only top-k experts per token | Many experts (8+), distinct domains |
| `xlora` | Learned scaling with frozen adapters, layer-wise | Independent adapters, privacy-critical |

## Architecture

```
┌──────────────────────────────────────────────┐
│                 ConcAdptrModel                │
│                                              │
│  ┌──────────┐  ┌───────────────────────────┐ │
│  │   Base    │  │     Adapter Registry      │ │
│  │  Model    │  │  ┌─────┐ ┌─────┐ ┌─────┐ │ │
│  │ (frozen)  │  │  │LoRA │ │LoRA │ │LoRA │ │ │
│  │           │  │  │  A  │ │  B  │ │  C  │ │ │
│  │           │  │  │froze│ │froze│ │froze│ │ │
│  └──────────┘  │  └──┬──┘ └──┬──┘ └──┬──┘ │ │
│                │     │       │       │     │ │
│                └─────┼───────┼───────┼─────┘ │
│                      │       │       │       │
│                ┌─────▼───────▼───────▼─────┐ │
│                │         Router            │ │
│                │       (trainable)         │ │
│                │  ┌──────────────────────┐ │ │
│                │  │   Gating Network     │ │ │
│                │  └──────────┬───────────┘ │ │
│                └─────────────┼─────────────┘ │
│                              │               │
│                    ┌─────────▼─────────┐     │
│                    │ Concocted Output  │     │
│                    └───────────────────┘     │
└──────────────────────────────────────────────┘
```

## The Multi-Customer Use Case

ConcAdptr was designed for a specific real-world pattern:

1. **You** fine-tune a base model on your general training data → your product's foundation model
2. **Each customer** fine-tunes on their private data (on-premise) → produces a LoRA adapter
3. **The adapter** (50-200MB, no raw data) is transferred back to you
4. **ConcAdptr** concocts all customer adapters into a MoE system with learned routing

Customer data never leaves their environment. The router learns which expert(s) to activate without seeing the original training data. This is **federated expertise** — cross-customer knowledge transfer without data sharing.

## Project Structure

```
concadptr/
├── concadptr/
│   ├── __init__.py          # Public API
│   ├── config.py            # Configuration classes
│   ├── model.py             # ConcAdptrModel (core)
│   ├── trainer.py           # ConcAdptrTrainer (router training)
│   ├── router/
│   │   ├── base.py          # BaseRouter ABC
│   │   ├── soft_merging.py  # Dense/soft routing (MoLoRA)
│   │   ├── top_k.py         # Sparse top-k routing (MixLoRA)
│   │   └── xlora.py         # Learned scaling (X-LoRA)
│   ├── adapters/
│   │   └── __init__.py      # AdapterRegistry
│   ├── serving/
│   │   └── server.py        # FastAPI inference server
│   └── utils/
│       └── visualization.py # Routing analysis tools
├── tests/
│   └── test_core.py         # Unit tests
├── examples/
│   └── config.yaml          # Example configuration
├── pyproject.toml
├── LICENSE
└── README.md
```

## Development

```bash
git clone https://github.com/irfanalii/concadptr.git
cd concadptr
pip install -e ".[dev]"
pytest
```

## Roadmap

- [x] Core library architecture
- [x] 3 routing strategies (soft, top-k, X-LoRA)
- [x] Adapter registry with compatibility validation
- [x] Router training pipeline
- [x] FastAPI serving
- [x] Routing visualization and analysis
- [ ] Full generation loop with per-layer routing hooks
- [ ] Integration with vLLM for high-throughput serving
- [ ] Adapter merging (linear, TIES, DARE) as fallback
- [ ] Hugging Face Hub adapter upload/download
- [ ] Benchmarking suite across model families
- [ ] Distributed router training

## References

- Hu et al. (2021) — [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685)
- Zadouri et al. (2023) — [Pushing Mixture of Experts to the Limit (MoLoRA)](https://arxiv.org/abs/2309.05444)
- Wu et al. (2024) — [Mixture of LoRA Experts (MoLE)](https://arxiv.org/abs/2404.13628)
- Li et al. (2024) — [MixLoRA](https://arxiv.org/abs/2404.15159)
- Buehler & Buehler (2024) — [X-LoRA: Mixture of Low-Rank Adapter Experts](https://github.com/EricLBuehler/xlora)
- Zhuang et al. (2025) — [LD-MoLE: Learnable Dynamic Routing for MoLE](https://arxiv.org/abs/2509.25684)

## License

Apache 2.0 — see [LICENSE](LICENSE) for details.

## Author

**Irfan Ali** — [GitHub](https://github.com/irfanalee) · [HuggingFace](https://huggingface.co/irfanalee) · [LinkedIn](https://linkedin.com/in/irfanalii)



---

*ConcAdptr — because the best models are concocted, not just trained.*
