Metadata-Version: 2.4
Name: livekit-plugins-relay-speech
Version: 0.1.2
Summary: LiveKit Agents plugin for Relay Speech — universal TTS adapter with sentence-level caching
License: Apache-2.0
Requires-Python: >=3.10
Requires-Dist: aiohttp<4.0.0,>=3.9.0
Requires-Dist: livekit-agents>=1.5.4
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Description-Content-Type: text/markdown

# livekit-plugins-relay-speech

LiveKit Agents plugin for **Relay Speech** — a universal TTS adapter that
serves from your reserved pool. Routes synthesis through a Relay Speech
(OpenTTS) server which fronts multiple upstream TTS providers behind a
single API, serves repeated sentences from your reserved pool, and tracks
usage per API key.

## Supported Providers

The plugin itself is provider-agnostic — it forwards your `provider` choice
to the Relay Speech server. The server currently supports:

| `provider` value | Upstream | Notes |
|------------------|----------|-------|
| `cartesia`       | Cartesia | Models: `sonic-3`, `sonic-2`, etc. |
| `elevenlabs`     | ElevenLabs | Multilingual + monolingual models |
| `sarvam`         | Sarvam AI | Indic languages |


## Installation

```bash
pip install livekit-plugins-relay-speech
```

## Usage

```python
from livekit.plugins import relay_speech

tts = relay_speech.TTS(
    provider="cartesia",
    voice_id=tts_voice,
    model=tts_model or "sonic-3",
    language="hi-IN",
    emotion=["positivity:high", "curiosity"],   # Cartesia sonic-3 emotion controls
    pronunciation_dict_id="my_dict_id",         # Cartesia pronunciation dictionary
    api_key=CARTESIA_API_KEY,                   # provider key (or set CARTESIA_API_KEY env var)
    relay_speech_api_key=RELAY_API_KEY,         # Relay Speech server key
    reserve_pool=True,                          # serve repeated sentences from your reserved pool
)
```

Hand `tts` to your LiveKit `AgentSession` / `VoicePipelineAgent` like any
other LiveKit TTS plugin.


## Constructor Parameters

| Param | Type | Default | Description |
|-------|------|---------|-------------|
| `provider` | str | `"cartesia"` | Upstream provider — `cartesia`, `elevenlabs`, `sarvam`. |
| `voice_id` | str | *(required)* | Provider-specific voice identifier. |
| `model` | str | `"sonic-3"` | TTS model name (provider-specific). |
| `language` | str | `"en"` | BCP-47 language code (`en`, `hi-IN`, …). |
| `speed` | float | `0.0` | -1.0 (slowest) → 1.0 (fastest); 0.0 = normal. |
| `volume` | float | `1.0` | 0.5 → 2.0; 1.0 = normal. |
| `emotion` | list[str] \| None | `None` | Provider voice-emotion tags (e.g. Cartesia sonic-3 emotion controls). Forwarded as-is. |
| `pronunciation_dict_id` | str \| None | `None` | Provider pronunciation-dictionary ID for custom pronunciations. |
| `duration` | float \| None | `None` | Target duration in seconds for the generated audio (Cartesia sonic-3). |
| `max_buffer_delay_ms` | int \| None | `None` | Max buffer delay (ms) before flushing a chunk (Cartesia sonic-3). |
| `add_timestamps` | bool \| None | `None` | Include word-level timestamps in the response (Cartesia sonic-3). |
| `add_phoneme_timestamps` | bool \| None | `None` | Include phoneme-level timestamps (Cartesia sonic-3). |
| `use_normalized_timestamps` | bool \| None | `None` | Return timestamps in normalized form (Cartesia sonic-3). |
| `sample_rate` | int | `24000` | PCM sample rate (Hz). Must match server config. |
| `api_key` | str \| None | env |
| `relay_speech_api_key` | str \| None | env |
| `reserve_pool` | bool | `False` | Serve repeated sentences from your reserved pool on the server. |

`tts.synthesize(text)` uses a one-shot HTTP `POST /v1/tts` instead of the WebSocket path.
