Metadata-Version: 2.4
Name: full-duplex-assistant
Version: 0.1.1
Summary: Real-time full-duplex voice assistant (FastAPI backend)
Project-URL: Homepage, https://github.com/leo007-htun/full_duplex_assistant
Project-URL: Issues, https://github.com/leo007-htun/full_duplex_assistant/issues
Author: Sithu Ye Htun (Leo)
License: MIT License
        
        Copyright (c) 2025 Sithu Ye Htun (Leo)
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Requires-Python: >=3.9
Description-Content-Type: text/markdown

# Real-Time Full-Duplex Voice Assistant <img alt="🎙️ Live Demo — com-cloud.cloud" src="https://img.shields.io/badge/%F0%9F%8E%99%EF%B8%8F%20Live%20Demo-com--cloud.cloud-orange?style=for-the-badge"/> </p>

Low-latency, interruptible, **full-duplex** (talk & listen at the same time) voice assistant with a web UI, streaming ASR, TTS, and LLM orchestration. Built for real conversations, barge-in, and hands-free control.

<p align="center">
  <img alt="full duplex assistant banner" src="https://img.shields.io/badge/voice-full--duplex-4A90E2">
  <img alt="docker compose" src="https://img.shields.io/badge/docker-compose-0db7ed">
  <img alt="realtime" src="https://img.shields.io/badge/latency-~low-brightgreen">
  <a href="https://com-cloud.cloud" target="_blank" rel="noopener noreferrer">
</a>


</p>

---

## ✨ Features

- **Full-duplex audio**: talk and listen simultaneously (barge-in / interruption supported).
- **Streaming ASR**: incremental transcripts while you speak.
- **Streaming TTS**: assistant responds with audio before text finishes.
- **LLM orchestration**: tool use/function calls and stateful dialog.
- **Web UI**: mic capture, waveforms, and live captions in-browser.
- **Production-ready stack**: Traefik reverse proxy + auto TLS, Nginx static hosting, FastAPI backend.
- **Single command up**: deploy with `docker compose up -d`.

---

## 🧭 Architecture

##### Application Flow

    Browser (Web UI)
    ├─ Mic capture (WebAudio) → WebSocket → Assistant (FastAPI)
    │ │
    │ partial transcripts
    │ ▼
    ├─ Live captions ← ASR (streaming via Assistant)
    │ │
    │ ▼
    ├─ TTS audio playback ← TTS (streaming chunks)
    │ ▲
    │ │
    └─ Controls/Events → LLM Orchestrator

#### 🐋 Docker Stack & Routing

               ┌───────────────────────────┐
               │        Internet            │
               └────────────┬──────────────┘
                            │  :80 / :443
                            ▼
                   ┌─────────────────┐
                   │     Traefik     │
                   │ (Reverse Proxy) │
                   └───────┬─────────┘
             ┌─────────────┼─────────────┐
             │             │             │
    ┌────────▼───┐   ┌─────▼─────┐   ┌──▼────────┐
    │   /        │   │   /api    │   │   /ws     │
    │   Web UI   │   │  Assistant│   │ Assistant │
    │ (Nginx)    │   │ (FastAPI) │   │ (FastAPI) │
    └────────────┘   └───────────┘   └───────────┘

  
#### Services in this repo

- **traefik**: reverse proxy, automatic HTTPS via Let’s Encrypt.
- **web**: static frontend (served by Nginx).
- **assistant**: FastAPI backend (ASR, TTS, LLM orchestration, WebSockets).
- **init_letsencrypt**: bootstrap storage for ACME certificates.

---

#### 🚀 Quick Start

##### 1. Prerequisites
- Docker & Docker Compose
- Domain pointing to your server: [`com-cloud.cloud`](https://com-cloud.cloud)
- DNS A/AAAA records configured
- API keys for ASR, TTS, and LLM providers

##### 2. Configure Environment

    Create `src/assistant/.env` with your secrets:

    # LLM / Orchestrator
    LLM_PROVIDER=openai
    OPENAI_API_KEY=sk-...
    
    # ASR
    ASR_PROVIDER=openai_realtime
    ASR_API_KEY=...
    
    # TTS
    TTS_PROVIDER=openai_realtime
    TTS_API_KEY=...
    
    # CORS / ORIGINS
    ALLOWED_ORIGINS=https://com-cloud.cloud
    
    # Optional
    LOG_LEVEL=info


##### 3. 🖥️ Local Development

###### Run backend directly:

    cd src/assistant
    python -m venv .venv && source .venv/bin/activate
    pip install -r requirements.txt
    uvicorn assistant.app:app --reload --host 0.0.0.0 --port 8000

###### Frontend
    cd web
    npm install
    npm run dev

#### 🎙️ Using the Assistant

    Open https://com-cloud.cloud
    
    Click on ORB to Connect to establish WebSocket session.
    
    Speak naturally; interrupt the assistant mid-sentence.
    
    Watch live captions, hear real-time TTS playback.
    
    DONT FOTGET TO CLOSE THE TAB!!!

#### ⚙️ Configuration

    Key options:
    
    ASR: model, language hints, VAD sensitivity.
    
    TTS: voice, speed, sample rate.
    
    LLM: model, temperature, tool schemas.
    
    Traefik: TLS challenge type, timeouts, rate limits.

#### 🔌 API

    GET /healthz – service health
    
    WS /ws/asr – audio in ↔ transcript out
    
    WS /ws/assistant – dialog orchestration (events + responses)
    
    WS /ws/tts – text in ↔ audio out
    
    POST /api/tools/<name> – trigger server-side tool functions

#### 🔐 Security

    HTTPS enforced (TLS via Let’s Encrypt + Traefik).
    
    Strict CORS (limited to https://com-cloud.cloud).
    
    API rate limiting enabled (/api).
    
    Secrets kept in .env (not in frontend).

#### 📦 Deployment Notes

    Reverse proxy: Traefik v3 with ACME TLS challenge.
    
    Certificates stored in ./letsencrypt/acme.json.
    
    Static frontend served by Nginx (web service).
    
    Backend served via assistant (FastAPI) behind Traefik.
    
    Scale with Docker Swarm / k8s if needed.

#### 🗺️ Roadmap

     Wake-word hotword detection
    
     Speaker diarization
    
     Plug-and-play tool registry
    
     Persistent transcripts
    
     Multi-voice TTS

#### 🤝 Contributing

    Fork this repo
    
    Create a feature branch
    
    Submit PR with screenshots/logs if UI/backend affected
