Metadata-Version: 2.4
Name: echo_vector
Version: 1.0.4
Summary: Semantic text search over audio files without full transcription
Project-URL: Homepage, https://github.com/ahron-maslin/echo_vector
Project-URL: Documentation, https://github.com/ahron-maslin/echo_vector#readme
Project-URL: Repository, https://github.com/ahron-maslin/echo_vector
Project-URL: Issues, https://github.com/ahron-maslin/echo_vector/issues
Project-URL: Changelog, https://github.com/ahron-maslin/echo_vector/blob/main/CHANGELOG.md
Author: EchoVector Contributors
License-Expression: MIT
Keywords: CLAP,FAISS,audio,embeddings,search,semantic,vector
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Multimedia :: Sound/Audio :: Analysis
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: Indexing
Classifier: Typing :: Typed
Requires-Python: >=3.12
Requires-Dist: faiss-cpu<2,>=1.7
Requires-Dist: librosa<1,>=0.10
Requires-Dist: numpy<3,>=1.26
Requires-Dist: pydantic<3,>=2.5
Requires-Dist: pydub<1,>=0.25
Requires-Dist: rich<14,>=13.7
Requires-Dist: soundfile<1,>=0.12
Requires-Dist: torch<3,>=2.1
Requires-Dist: tqdm<5,>=4.66
Requires-Dist: transformers<5,>=4.36
Requires-Dist: typer[all]<1,>=0.12
Provides-Extra: all
Requires-Dist: fastapi<1,>=0.109; extra == 'all'
Requires-Dist: httpx2<3,>=2.0; extra == 'all'
Requires-Dist: hypothesis<7,>=6.92; extra == 'all'
Requires-Dist: mkdocs-gen-files<1,>=0.5; extra == 'all'
Requires-Dist: mkdocs-literate-nav<1,>=0.6; extra == 'all'
Requires-Dist: mkdocs-material<10,>=9.5; extra == 'all'
Requires-Dist: mkdocstrings[python]<1,>=0.24; extra == 'all'
Requires-Dist: mutmut<3,>=2.4; extra == 'all'
Requires-Dist: mypy<2,>=1.8; extra == 'all'
Requires-Dist: pre-commit<4,>=3.6; extra == 'all'
Requires-Dist: pytest-asyncio<1,>=0.23; extra == 'all'
Requires-Dist: pytest-cov<6,>=5.0; extra == 'all'
Requires-Dist: pytest-xdist<4,>=3.5; extra == 'all'
Requires-Dist: pytest<9,>=8.0; extra == 'all'
Requires-Dist: ruff<1,>=0.15; extra == 'all'
Requires-Dist: uvicorn[standard]<1,>=0.27; extra == 'all'
Provides-Extra: api
Requires-Dist: fastapi<1,>=0.109; extra == 'api'
Requires-Dist: uvicorn[standard]<1,>=0.27; extra == 'api'
Provides-Extra: dev
Requires-Dist: httpx2<3,>=2.0; extra == 'dev'
Requires-Dist: hypothesis<7,>=6.92; extra == 'dev'
Requires-Dist: mutmut<3,>=2.4; extra == 'dev'
Requires-Dist: mypy<2,>=1.8; extra == 'dev'
Requires-Dist: pre-commit<4,>=3.6; extra == 'dev'
Requires-Dist: pytest-asyncio<1,>=0.23; extra == 'dev'
Requires-Dist: pytest-cov<6,>=5.0; extra == 'dev'
Requires-Dist: pytest-xdist<4,>=3.5; extra == 'dev'
Requires-Dist: pytest<9,>=8.0; extra == 'dev'
Requires-Dist: ruff<1,>=0.15; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-gen-files<1,>=0.5; extra == 'docs'
Requires-Dist: mkdocs-literate-nav<1,>=0.6; extra == 'docs'
Requires-Dist: mkdocs-material<10,>=9.5; extra == 'docs'
Requires-Dist: mkdocstrings[python]<1,>=0.24; extra == 'docs'
Description-Content-Type: text/markdown

# 🔊 EchoVector

> **Semantic text search over audio files — without full transcription.**

[![CI](https://github.com/echovector/echovector/actions/workflows/test.yml/badge.svg)](https://github.com/echovector/echovector/actions/workflows/test.yml)
[![Coverage](https://img.shields.io/badge/coverage-%3E95%25-brightgreen)](.)
[![Python](https://img.shields.io/badge/python-3.12%2B-blue)](.)
[![License](https://img.shields.io/badge/license-MIT-green)](LICENSE)

---

## What is EchoVector?

EchoVector indexes audio files by generating **semantic embeddings directly from audio waveforms**, then lets you search them with natural language text queries — all without transcribing a single word.

### Traditional approach (slow & expensive)

```
Audio → Full Transcription → Text Embeddings → Text Search
```

### EchoVector approach (fast & efficient)

```
Audio → Audio Chunks → Audio Embeddings ─┐
                                          ├─► ANN Search → Results
Text Query → Text Embedding ──────────────┘
```

## Features

- 🎵 **Multi-format support** — MP3, WAV, FLAC, M4A
- 🧠 **Direct audio embeddings** — No transcription needed
- 🔍 **Semantic search** — Query with natural language
- ⚡ **FAISS-powered** — Approximate nearest neighbor search
- 🔌 **Pluggable backends** — CLAP, Whisper, wav2vec2, HuBERT, AST
- 🧪 **Offline smoke backend** — `local` backend for CI/Kaggle tests without model downloads
- 📊 **Rich CLI** — Progress bars, colors, benchmarking mode
- 🌐 **REST API** — Optional FastAPI server
- 📦 **Production-ready** — Typed, tested, documented

## Quick Start

### Installation

```bash
pip install echo_vector
```

Or with uv:

```bash
uv add echo_vector
```

### CLI Usage

```bash
# One-time indexing: split audio into timestamped chunks and embed each chunk
echovector index ./meetings

# Fast repeated search: embed only the text query and search the saved FAISS index
echovector search "discussion about transformers"

# Search with options
echovector search "pricing strategy" --top-k 10

# View index statistics
echovector stats
```

For a no-download smoke test, use the deterministic local backend:

```bash
echovector index ./meetings --backend local --store-dir ./ev-index
echovector search "high alarm tone" --backend local --store-dir ./ev-index
echovector stats --backend local --store-dir ./ev-index
```

The search command does not reopen or scan the audio files. All expensive audio processing happens
during `index`; `search` loads the saved vector index, embeds the short text query, and returns the
nearest timestamped chunks.

### Python API

```python
from echovector import EchoVector

ev = EchoVector()

# Index audio files
ev.index("./meetings")

# Search with natural language
results = ev.search("conversation about CUDA kernels")

for r in results:
    print(
        f"{r.filepath} "
        f"[{r.timestamp_range.start:.1f}s - {r.timestamp_range.end:.1f}s] "
        f"score={r.score:.4f}"
    )
```

## Testing on Kaggle

Kaggle is useful for GPU-backed CLAP tests, but first check the runtime Python version:

```python
import sys
print(sys.version)
```

EchoVector currently declares `Python >=3.12`. If the Kaggle image is older, install and test in a
Python 3.12-capable environment instead, or relax the project requirement only after validating the
test suite on that Python version.

### Notebook smoke test without internet/model downloads

Upload this repository as a Kaggle dataset, attach it to a notebook, then run:

```python
%cd /kaggle/input/<your-echo-vector-dataset>
!pip install -e . --no-deps
!pip install numpy soundfile librosa faiss-cpu typer rich pydantic
!python -m pytest tests/ -q
```

Create a tiny audio corpus and test the real CLI/index path:

```python
import os
import numpy as np
import soundfile as sf

audio_dir = "/kaggle/working/ev-audio"
index_dir = "/kaggle/working/ev-index"
os.makedirs(audio_dir, exist_ok=True)

sr = 16000
t = np.linspace(0, 1.0, sr, endpoint=False)
sf.write(f"{audio_dir}/high_tone.wav", 0.25 * np.sin(2 * np.pi * 880 * t), sr)
sf.write(f"{audio_dir}/low_tone.wav", 0.25 * np.sin(2 * np.pi * 110 * t), sr)
```

```python
!echovector index /kaggle/working/ev-audio --backend local --store-dir /kaggle/working/ev-index --reset
!echovector search "high alarm tone" --backend local --store-dir /kaggle/working/ev-index --top-k 2
!echovector stats --backend local --store-dir /kaggle/working/ev-index
```

This validates packaging, audio loading, FAISS persistence, metadata storage, and the CLI without
depending on Hugging Face downloads.

### CLAP semantic test

For actual semantic text-to-audio search, enable internet in the notebook settings and use a GPU
runtime if available:

```python
!pip install transformers torch faiss-cpu librosa soundfile
!echovector index /kaggle/input/<audio-dataset> --backend clap --device cuda --store-dir /kaggle/working/clap-index --recursive --reset
!echovector search "people discussing pricing strategy" --backend clap --device cuda --store-dir /kaggle/working/clap-index --top-k 10
```

If GPU is unavailable, replace `--device cuda` with `--device cpu`; it will be slower. Keep indexes
under `/kaggle/working` so they are writable during the notebook session.

## Architecture

```
echovector/
├── audio/        # Audio loading, chunking, streaming, metadata
├── embeddings/   # Pluggable embedding backends (CLAP, Whisper, etc.)
├── indexing/     # Vector index backends (FAISS, with pluggable design)
├── search/       # Search engine, filtering, result hydration
├── cli/          # Typer-based CLI with Rich output
├── api/          # Optional FastAPI server
├── evaluation/   # Metrics (recall@k, throughput)
├── benchmarks/   # Reproducible benchmark harness
└── utils/        # Config, logging, helpers
```

## Supported Embedding Backends

| Backend | Text+Audio Aligned | Notes |
|---------|-------------------|-------|
| **CLAP** (default) | ✅ | Best for text→audio search |
| Whisper Encoder | ❌ | Audio-only embeddings |
| wav2vec2 | ❌ | Audio-only, good for speech |
| HuBERT | ❌ | Audio-only, self-supervised |
| Audio Spectrogram Transformer | ❌ | Audio-only, classification-focused |

## Development

```bash
# Clone and install
git clone https://github.com/echovector/echovector.git
cd echovector
uv sync --all-extras

# Run checks
make lint
make typecheck
make test
make coverage
```

## License

MIT
