Metadata-Version: 2.4
Name: nfinitmonkeys-cortex-sdk
Version: 2.0.0
Summary: Official Python SDK for Cortex API — Secure LLM Inference Gateway by InfiniteMonkeys
Project-URL: Homepage, https://cortex.nfinitmonkeys.com
Project-URL: Status, https://status.nfinitmonkeys.com
Project-URL: Source, https://github.com/Seboj/Cortex/tree/main/sdks/python
Author-email: InfiniteMonkeys <hello@infinitemonkeys.com>
License: MIT
Keywords: ai,cortex,llm,ocr,openai,rag,sdk,vlm,whisper
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: httpx>=0.27
Requires-Dist: pydantic>=2.0
Provides-Extra: dev
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.24; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: respx>=0.21; extra == 'dev'
Requires-Dist: ruff>=0.6; extra == 'dev'
Description-Content-Type: text/markdown

# Cortex Python SDK

The friendly Python client for [Cortex](https://cortex.nfinitmonkeys.com) — InfiniteMonkeys' secure LLM gateway. Chat, vision, embeddings, speech, RAG, research — all in one typed client.

```bash
pip install cortex-sdk
```

```python
from cortex_sdk import Cortex

cortex = Cortex(api_key="sk-cortex-...")
print(cortex.chat("Hello, world!").text)
```

That's it. Keep reading for every capability.

---

## Table of contents

- [Setup](#setup)
- [Chat](#chat)
- [Streaming chat](#streaming-chat)
- [Response style presets](#response-style-presets)
- [JSON / structured output](#json--structured-output)
- [Embeddings](#embeddings)
- [Speech-to-text](#speech-to-text-transcription)
- [Text-to-speech](#text-to-speech)
- [Document extraction (Iris)](#document-extraction-iris)
- [OCR + form templates](#ocr--form-templates)
- [Deep Research](#deep-research)
- [RAG collections](#rag-collections)
- [Async](#async-asynccortex)
- [Error handling](#error-handling)
- [Checking service health](#checking-service-health)
- [Configuration](#configuration)

---

## Setup

Install from PyPI:

```bash
pip install nfinitmonkeys-cortex-sdk
```

Or pin a specific version:

```bash
pip install "nfinitmonkeys-cortex-sdk>=2.0,<3"
```

The SDK reads `CORTEX_API_KEY` from your environment by default:

```python
from cortex_sdk import Cortex
cortex = Cortex()
```

Or pass explicitly:

```python
cortex = Cortex(api_key="sk-cortex-...")
```

Close it when you're done (or use `with`):

```python
with Cortex() as cortex:
    ...
```

---

## Migrating from v1.x

v2 rewrote the client around a flatter, friendlier API. Your old imports still
resolve — `CortexClient` is now an alias for `Cortex` — but the method shape
changed. One-time rewrite, no polyfills.

| v1 (resource-group style) | v2 (flat style) |
|--------------------------|-----------------|
| `client.chat.completions.create(model="default", messages=[{"role":"user","content":"Hi"}])` | `cortex.chat("Hi")` |
| `client.chat.completions.create(..., stream=True)` → iterate raw chunks | `for c in cortex.chat_stream("Hi"): ...` (yields just content) |
| `client.embeddings.create(input="x", model="bge-m3")` | `cortex.embed("x")` |
| `client.audio.transcriptions.create(file=...)` | `cortex.transcribe("audio.wav")` |
| `client.audio.speech.create(input=..., voice=...)` | `cortex.speak("text", voice="james")` |
| `client.iris.extract(file=...)` | `cortex.extract("doc.pdf")` |

What's new in v2 that wasn't in v1:
- Style presets (`style="concise"`, `"markdown"`, etc.)
- Typed error subclasses (catch `CortexRateLimitError` specifically)
- `ChatResponse.parse_json()` — strips markdown fences from LLM JSON output
- Auto-retry on 429/5xx with `Retry-After` honoring
- RAG Collections sub-client
- Deep Research sub-client with `wait()` helper
- Static `Cortex.status()` — no API key needed
- Full async client (`AsyncCortex`)

No timeline to remove the `CortexClient` alias — it stays forever.

---

## Chat

Pass a plain string for the simplest case:

```python
r = cortex.chat("What is a vector database?")
print(r.text)
```

Multi-turn conversations use message dicts:

```python
r = cortex.chat([
    {"role": "system", "content": "You are a concise assistant."},
    {"role": "user", "content": "Name three NoSQL databases."},
])
```

Route to a specific pool when you know what you want:

```python
r = cortex.chat("Extract names from: Alice, Bob, Carol", pool="cortex-extract")
```

---

## Streaming chat

Get tokens as they're generated — great for interactive UIs:

```python
for chunk in cortex.chat_stream("Write a limerick about otters"):
    print(chunk, end="", flush=True)
```

---

## Response style presets

Instead of writing system prompts, pick a style:

```python
cortex.chat("Summarize RAG.", style="concise")       # one-sentence conclusion
cortex.chat("Show me a Python list", style="code-only")
cortex.chat("Compare Redis and MongoDB", style="markdown")
cortex.chat("Deploy nginx", style="technical")
cortex.chat("Help me pick a name", style="chat")
```

Combine with a custom system prompt:

```python
cortex.chat(
    "Find the bug",
    style="technical",
    system="You are a staff Python engineer reviewing a PR.",
)
```

---

## JSON / structured output

Force the model to return JSON matching an exact schema — guaranteed, no retries, no regex parsing:

```python
r = cortex.chat(
    "John Doe, 42, lives in Boston.",
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "person",
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "age": {"type": "integer"},
                    "city": {"type": "string"},
                },
                "required": ["name", "age", "city"],
            },
        },
    },
)
import json
data = json.loads(r.text)   # always valid
```

---

## Embeddings

One string → a 1024-dim vector:

```python
v = cortex.embed("cortex is a gateway")
len(v)   # 1024
```

Many strings → batch:

```python
vectors = cortex.embed(["doc one", "doc two", "doc three"])
```

---

## Speech-to-text (transcription)

Pass a file path, bytes, or file-like object:

```python
t = cortex.transcribe("meeting.wav")
print(t.text)
```

With speaker diarization:

```python
t = cortex.transcribe("meeting.wav", diarize=True)
```

With a language hint:

```python
t = cortex.transcribe(audio_bytes, language="es")
```

---

## Text-to-speech

```python
audio = cortex.speak("Welcome to Cortex.")    # returns bytes
```

Render straight to disk:

```python
cortex.speak_to_file("Welcome to Cortex.", "hello.wav")
```

Expressive mode (adds laughs, sighs, pauses automatically):

```python
cortex.speak_to_file(
    "Wow! That's amazing. I'm so glad you came.",
    "expressive.wav",
    expressive=True,
)
```

Voice selection:

```python
cortex.speak("Hello", voice="james")
```

---

## Document extraction (Iris)

Upload a PDF, image, or screenshot and get structured data back:

```python
inv = cortex.extract("invoice.pdf", type="invoice")
print(inv.result)
# {'vendor': 'Acme', 'total': 127.43, 'line_items': [...]}
```

Custom schemas for any document:

```python
medical = cortex.extract(
    "discharge-summary.pdf",
    schema={
        "patient_name": "string",
        "diagnosis_codes": "string[]",
        "discharge_date": "date",
    },
)
```

Submit a correction when the extraction was wrong (helps train the model):

```python
cortex.correct_extraction(inv.id, [
    {"field_name": "total", "original_value": "127.43", "corrected_value": "1274.30"},
])
```

---

## OCR + form templates

Raw OCR on any image or PDF:

```python
ocr = cortex.ocr("scan.png")
print(ocr.text)
```

For forms you see often, use a template to get structured fields in ~200ms:

```python
fields = cortex.ocr("claim.pdf", template="cms1500")
print(fields.fields["patient_name"])
```

Built-in templates: `cms1500`, `ub04`, `superbill`, `eob`.

Auto-learn a template from your own form:

```python
cortex.learn_template("blank-intake-form.pdf", template_id="my_intake")
# ...later:
fields = cortex.ocr("filled-intake.pdf", template="my_intake")
```

For unknown layouts, use the OCR-free model (slower but more flexible):

```python
cortex.ocr_understand("complex-doc.pdf")
```

---

## Deep Research

Kick off an autonomous research agent (web search + page fetch + vision + summarisation):

```python
job = cortex.research.submit(
    "What do you know about Acme Medical Group?",
    type="company_enrichment",
    depth="quick",   # 'quick' | 'standard' | 'deep'
)
```

Block until it's done (automatic polling):

```python
result = cortex.research.wait(job.job_id, timeout=600)
print(result.result["narrative"])
```

Or poll yourself:

```python
job = cortex.research.get(job_id)
if job.is_done:
    ...
```

---

## RAG collections

Build a knowledge base:

```python
cortex.collections.create("company-kb")
cortex.collections.upload("company-kb", "handbook.pdf")
cortex.collections.upload("company-kb", "policies.md")
```

Ask questions (search + LLM in one call):

```python
a = cortex.collections.ask("company-kb", "What's our PTO policy?")
print(a.answer)
for s in a.sources:
    print(f"  · {s['filename']} ({s['score']:.0%} match)")
```

Or just run a semantic search:

```python
hits = cortex.collections.search("company-kb", "parental leave", top_k=3)
```

---

## Async (`AsyncCortex`)

Same API, coroutine signatures:

```python
import asyncio
from cortex_sdk import AsyncCortex

async def main():
    async with AsyncCortex() as cortex:
        r = await cortex.chat("Hello async")
        vec = await cortex.embed("async embedding")
        print(r.text, len(vec))

asyncio.run(main())
```

---

## Error handling

All errors subclass `CortexError`. Catch the base for retry logic, catch specific subclasses for targeted handling:

```python
from cortex_sdk import Cortex, CortexRateLimitError, CortexAuthError, CortexError

try:
    r = cortex.chat("hello")
except CortexAuthError:
    # Your key is invalid or revoked
    ...
except CortexRateLimitError:
    # You hit a token/request quota — back off
    ...
except CortexError as e:
    print(f"Cortex failed: {e.status_code} {e.detail}")
```

The client already retries transient errors (429, 5xx) with exponential backoff — you only see an error if the retries are exhausted.

---

## Checking service health

Check the public status page from your code (no API key needed):

```python
from cortex_sdk import Cortex

status = Cortex.status()
print(f"Cortex is {status.overall}")
for pool in status.pools:
    print(f"  {pool.pool}: {pool.status}")
```

Or point your browser at [status.nfinitmonkeys.com](https://status.nfinitmonkeys.com).

---

## Configuration

| Param | Default | What |
|-------|---------|------|
| `api_key` | `$CORTEX_API_KEY` | Your API key |
| `base_url` | `https://cortexapi.nfinitmonkeys.com` | Override for self-hosted Cortex |
| `timeout` | `120` seconds | Default request timeout |
| `retries` | `2` | Retries on 429/5xx (exponential backoff) |
| `default_pool` | `None` | Default `X-Cortex-Pool` header |

```python
cortex = Cortex(timeout=300, retries=5, default_pool="cortex-extract")
```

---

## Support

- Docs: this README plus inline docstrings on every method
- Status: [status.nfinitmonkeys.com](https://status.nfinitmonkeys.com)
- Issues: file on GitHub

Happy building 🦧
