Metadata-Version: 2.4
Name: voicepaste
Version: 0.1.0
Summary: Local Whisper voice dictation for any app or LLM — push-to-talk, auto-paste, offline.
License: MIT
Project-URL: Homepage, https://github.com/obsoul/voicepaste
Project-URL: Issues, https://github.com/obsoul/voicepaste/issues
Keywords: whisper,voice,dictation,transcription,speech-to-text,llm,voice-input,push-to-talk
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: faster-whisper>=1.0.0
Requires-Dist: sounddevice>=0.4.6
Requires-Dist: numpy>=1.24.0
Requires-Dist: scipy>=1.11.0
Requires-Dist: PyYAML>=6.0
Requires-Dist: pyperclip>=1.8.2
Provides-Extra: tray
Requires-Dist: pystray>=0.19.4; extra == "tray"
Requires-Dist: Pillow>=10.0.0; extra == "tray"
Requires-Dist: keyboard>=0.13.5; extra == "tray"
Requires-Dist: pyautogui>=0.9.54; extra == "tray"
Dynamic: license-file

# voicepaste 🎙

**Talk instead of type.** Hold a hotkey, say what you want, let go — your words appear instantly in any app.

Works with **ChatGPT, Claude, Cursor, Gemini, Copilot, Ollama, or any app** on Windows, Mac, and Linux. Runs 100% on your computer. No internet, no API keys, no subscriptions.

---

## What it does

- **Hold a hotkey → speak → release** — text is typed wherever your cursor is
- Works inside **Claude Code** as a `/voice` command
- Transcribes your voice using [Whisper AI](https://github.com/openai/whisper), running locally on your CPU
- Under **0.5 seconds** from when you stop talking to when text appears

---

## Before you start — what you need

You only need three things:

### 1. Python 3.10 or newer
If you're not sure whether you have it, open PowerShell and type:
```
python --version
```
If you see `Python 3.10` or higher, you're good. If not, download it free from **[python.org](https://python.org)** — make sure to check the box that says **"Add Python to PATH"** during installation.

### 2. A microphone
Any microphone works — your laptop's built-in mic, a webcam mic, a USB mic, or a headset. If your computer can make video calls, it will work.

### 3. An LLM or app to dictate into
VoicePaste works anywhere you can type — ChatGPT, Claude Code, Cursor, VS Code, Notepad, Word, your browser, anything.

---

## Installation — 3 steps

> **Windows or Mac?** The steps are almost identical — just pick the right installer in Step 2.

### Step 1 — Download the project

**Windows** — open PowerShell (search "PowerShell" in your Start menu):
```powershell
git clone https://github.com/obsoul/voicepaste.git
cd voicepaste
```

**Mac / Linux** — open Terminal:
```bash
git clone https://github.com/obsoul/voicepaste.git
cd voicepaste
```

> Don't have `git`? **Windows:** download from [git-scm.com](https://git-scm.com). **Mac:** run `xcode-select --install` in Terminal. **Linux:** run `sudo apt install git`.

### Step 2 — Run the installer

**Windows:**
```powershell
powershell -ExecutionPolicy Bypass -File install.ps1
```

**Mac / Linux:**
```bash
bash install.sh
```

> Linux requires `xdotool` and `xclip` for paste support — the installer handles this automatically on Ubuntu/Debian, Fedora, and Arch.

This will:
- Install all required Python packages (~2 minutes on first run)
- Download the Whisper AI model to your computer (~75 MB, one time only)
- Set up the `/voice` command in Claude Code (if you use it)
- Create a config file at `~/.voicepaste/config.yaml`

### Step 3 — Start the background service

The background service keeps the AI model loaded in memory so transcription is fast:

```powershell
python main.py serve
```

Keep this window open. You can minimize it — it runs quietly in the background.

**That's it. You're ready.**

---

## How to use it

### Option A — Global hotkey (works in any app)

Open a terminal and run:

```bash
python main.py hotkey
```

Now you can dictate **anywhere on your computer**:

1. Click into any text field (ChatGPT, Claude, Cursor, Notepad, Word, browser — anything)
2. Hold **Ctrl + Shift + Space**
3. Speak
4. Release — your words appear

> **Note (Windows):** The global hotkey requires running PowerShell as Administrator, or enabling Windows Developer Mode. See [Troubleshooting](#troubleshooting) if this doesn't work.

### Option B — Inside Claude Code (the `/voice` command)

Open Claude Code, type `/voice`, and press Enter. VoicePaste will transcribe and paste your words directly into the chat.

```
/voice
```

Want more time to speak?
```
/voice --dur 15
```

### Option C — Quick one-time recording

Don't want the background service? Use this for a single recording:

```bash
python main.py once --dur 8
```

---

## Where text gets pasted

Configure `paste_mode` in `~/.voicepaste/config.yaml`:

| Mode | What it does |
|------|-------------|
| `auto` | Pastes into whatever window is currently focused (works with any app) |
| `claude` | Focuses Claude Code specifically, then pastes |
| `clipboard` | Copies to clipboard only — you paste with Ctrl+V / Cmd+V |

Default is `auto` — works universally.

---

## The two pieces explained

VoicePaste has two parts that work together:

| Part | What it does | How to start it |
|------|-------------|-----------------|
| **Background service** (daemon) | Keeps the AI model loaded so transcription is instant | `python main.py serve` |
| **Hotkey listener** | Detects when you hold/release the hotkey | `python main.py hotkey` |

You start both once, minimize the windows, and forget about them.

**To start both automatically together:**
```bash
python main.py serve
# In a second terminal:
python main.py hotkey --autostart
```

---

## GPU Acceleration (optional)

If you have an NVIDIA GPU, VoicePaste will automatically use it — dropping transcription time from ~0.3s to **~0.03s** and unlocking larger, more accurate models.

### Check your hardware
```bash
python main.py detect
```

### Enable GPU (if not auto-detected)

**Step 1** — Install CUDA drivers from [nvidia.com/cuda-downloads](https://developer.nvidia.com/cuda-downloads)

**Step 2** — Install CUDA Python packages:
```bash
pip install nvidia-cublas-cu12 nvidia-cudnn-cu12
```

**Step 3** — Update `~/.voicepaste/config.yaml`:
```yaml
device: cuda
compute_type: float16
model: large-v3   # near-perfect accuracy at GPU speed
```

**Step 4** — Restart the background service: `python main.py serve`

### GPU speed comparison

| Model | CPU speed | GPU speed | Quality |
|-------|-----------|-----------|---------|
| tiny | ~0.3s | ~0.03s | Great for everyday speech |
| base | ~2s | ~0.08s | Better accuracy |
| small | ~4s | ~0.15s | Strong accuracy |
| medium | too slow | ~0.3s | High accuracy |
| large-v3 | too slow | ~0.8s | Best available — handles accents, technical terms |

> **No NVIDIA GPU?** The default CPU + tiny setup is already fast. GPU is an enhancement, not a requirement.

---

## Customizing your settings

Your settings live at `~/.voicepaste/config.yaml`. Open it with any text editor.

```yaml
# How accurate vs how fast:
#   tiny   = fastest (~0.5s), still very accurate for most speech
#   base   = a bit slower (~2s), slightly more accurate
#   small  = slower (~4s), more accurate
model: tiny

# Language — "auto" detects it automatically, or set e.g. "en", "fr", "es"
language: auto

# The hotkey to hold while speaking
hotkey: ctrl+shift+space

# Where the text goes after transcription:
#   auto      = paste into whatever app you're typing in  ← recommended
#   claude    = paste into Claude Code specifically
#   clipboard = just copy to clipboard (you paste with Ctrl+V)
paste_mode: auto
```

After changing settings, restart the background service for them to take effect.

---

## Troubleshooting

### "The hotkey isn't working" (Windows)
The `keyboard` library needs elevated permissions to intercept global key presses. Try one of these:

**Option 1** — Run PowerShell as Administrator:
Right-click PowerShell in the Start menu → "Run as administrator" → run `python main.py hotkey`

**Option 2** — Enable Windows Developer Mode:
Settings → System → For Developers → turn on "Developer Mode" → restart and try again

### "The hotkey isn't working" (Linux)
- **X11:** Should work out of the box after install. If not, try running with `sudo`.
- **Wayland:** Global hotkeys have limited support under Wayland. Log out and choose an X11 session at the login screen.
- Make sure `python-xlib` is installed: `pip3 install python-xlib`

### "The hotkey isn't working" (Mac)
Mac requires Accessibility permission for the hotkey listener:

1. Open **System Settings** → **Privacy & Security** → **Accessibility**
2. Click the **+** button
3. Add your Terminal app (Terminal, iTerm2, or whichever you use)
4. Restart the hotkey listener: `python3 main.py hotkey`

### "It's not picking up my voice"
- Check that your microphone is set as the default input device
- Try speaking louder or closer to the mic
- Run `python main.py once --dur 5` — if it works, just restart the hotkey listener

### "It says 'no speech detected'"
- Make sure you're speaking **during** the countdown, not after
- Check that your mic volume is turned up in system sound settings

### "I want to stop everything"
```bash
python main.py stop    # stop the background service
# Close the hotkey listener window, or press Ctrl+C
```

---

## Frequently asked questions

**Does my voice get sent to the internet?**
No. Everything runs on your computer. The Whisper model is downloaded once during install, then runs locally forever.

**Does it work on Mac and Linux?**
Yes — full Mac and Linux support is included. Run `bash install.sh` on either platform.

**Can I use it with ChatGPT / Cursor / Gemini / VS Code?**
Yes — set `paste_mode: auto` and it will paste into whatever window you're currently typing in.

**How accurate is it?**
Very accurate for clear speech in English. It handles accents well. Background noise can reduce accuracy — a quiet room or headset helps.

**Can I use a different language?**
Yes — set `language: fr` (or any language code) in your config, or pass `--language fr` to the CLI.

**Can I make it more accurate?**
Change `model: tiny` to `model: base` or `model: small`, then restart the background service.

---

## Commands reference

```bash
# Start the background service (keep this running)
python main.py serve

# Start the global push-to-talk hotkey
python main.py hotkey

# Start hotkey and auto-launch the background service if needed
python main.py hotkey --autostart

# Record once for 8 seconds (no hotkey needed)
python main.py trigger --dur 8

# Stop the background service
python main.py stop

# Detect your GPU/CPU and see recommended settings
python main.py detect

# Create a fresh config file
python main.py setup
```

---

## Project layout

```
voicepaste/
├── main.py               ← Start here — all commands go through this
├── SKILL.md              ← The /voice Claude Code skill definition
├── install.ps1           ← Windows one-click installer
├── install.sh            ← Mac/Linux one-click installer
├── config.yaml           ← Default settings (copy to ~/.voicepaste/)
├── requirements.txt      ← Python packages needed
└── voicepaste/
    ├── daemon.py         ← Background service (keeps AI model loaded)
    ├── hotkey.py         ← Push-to-talk hotkey listener
    ├── recorder.py       ← Microphone recording
    ├── transcriber.py    ← Whisper AI transcription
    ├── paster.py         ← Pastes text into the right window
    ├── client.py         ← Talks to the background service
    └── config.py         ← Loads and saves settings
```

---

## Contributing

Pull requests are welcome! If you run into a bug or want to suggest a feature, please [open an issue](https://github.com/obsoul/voicepaste/issues).

Ideas for future versions:
- Wake word activation ("Hey Claude...", "Hey Cursor...")
- Real-time streaming transcription (words appear as you speak)
- GUI config editor

---

## License

MIT — free to use, modify, and distribute.
