Metadata-Version: 2.4
Name: eye2byte
Version: 0.3.1
Summary: Screen-context sidecar for coding agents
Author: wolverin0
License-Expression: MIT
Project-URL: Homepage, https://github.com/wolverin0/Eye2byte
Project-URL: Changelog, https://github.com/wolverin0/Eye2byte/blob/claude/screen-context-sidecar-KDVSF/CHANGELOG.md
Project-URL: Issues, https://github.com/wolverin0/Eye2byte/issues
Keywords: screen-capture,mcp,coding-agent,vision,context
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Quality Assurance
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: Pillow
Requires-Dist: fastmcp>=2.10
Provides-Extra: voice
Requires-Dist: openai-whisper; extra == "voice"
Provides-Extra: ui
Requires-Dist: customtkinter>=5.0; extra == "ui"
Requires-Dist: pystray; extra == "ui"
Provides-Extra: all
Requires-Dist: openai-whisper; extra == "all"
Requires-Dist: customtkinter>=5.0; extra == "all"
Requires-Dist: pystray; extra == "all"

<p align="center">
  <h1 align="center">Eye2byte</h1>
  <p align="center">Screen-context sidecar for coding agents</p>
</p>

<p align="center">
  <a href="#setup"><img src="https://img.shields.io/badge/python-3.10+-blue?logo=python&logoColor=white" alt="Python 3.10+"></a>
  <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-green" alt="MIT License"></a>
  <a href="#platforms"><img src="https://img.shields.io/badge/platform-Windows%20%7C%20macOS%20%7C%20Linux%20%7C%20Android-lightgrey" alt="Cross-platform"></a>
  <a href="CHANGELOG.md"><img src="https://img.shields.io/badge/changelog-CHANGELOG.md-orange" alt="Changelog"></a>
</p>

---

Captures your screen, voice, and annotations, feeds them to any vision model, and produces structured **Context Packs** your coding agent can act on.

```
Screen / Voice / Annotations  -->  Vision Model + Whisper  -->  Context Pack  -->  Coding Agent
```

## Features

- **Multi-monitor capture** — active, specific (1/2/3), or all monitors at once
- **Voice narration** — record, clean (noise removal + normalization), transcribe locally
- **Annotations** — arrows, circles, rectangles, freehand, multi-line text on a frozen screenshot
- **Screen clips** — record short videos, extract keyframes, analyze the sequence
- **Image optimization** — auto resize + compress (~5x smaller, zero quality loss)
- **MCP server** — coding agents query your screen directly via Model Context Protocol
- **Context Packs** — structured output: goal, environment, errors, signals, next steps

## Platforms

| Platform | Screenshot | Voice | Annotation | Hotkeys |
|----------|-----------|-------|------------|---------|
| Windows | PowerShell .NET | ffmpeg | Pillow | Ctrl+Shift+1-5 |
| macOS | screencapture | ffmpeg | Pillow | - |
| Linux | scrot/maim/flameshot | ffmpeg | Pillow | - |
| Android | ADB (Termux) | Termux:API | - | - |

## Setup

### 1. Install

```bash
pip install eye2byte             # Core + MCP server (Pillow + fastmcp)
pip install eye2byte[voice]      # + local voice transcription (openai-whisper)
pip install eye2byte[ui]         # + control panel (customtkinter)
pip install eye2byte[all]        # Everything
# ffmpeg is required for voice/clips — install via your package manager
```

Or install from source:

```bash
pip install Pillow fastmcp       # Core + MCP server
pip install openai-whisper       # Local voice transcription (optional)
```

### 2. Configure a vision provider

Eye2byte works with **any vision model** — local or cloud. Set your provider in `~/.eye2byte/config.json` or the Settings UI:

| Provider | Setup | Cost |
|----------|-------|------|
| **Ollama** (local) | [Install Ollama](https://ollama.com), `ollama pull qwen3-vl:8b` | Free |
| **Gemini** | Set `GEMINI_API_KEY` in `.env` | Free tier (1000 req/day) |
| **OpenRouter** | Set `OPENROUTER_API_KEY` in `.env` | Free models available |
| **Hyperbolic** | Set `HYPERBOLIC_API_KEY` in `.env` | Pay per use |

```bash
# .env file (project dir, cwd, or ~/.eye2byte/.env)
GEMINI_API_KEY=your-key-here
# or OPENROUTER_API_KEY=...
# or HYPERBOLIC_API_KEY=...
```

### 3. Run

```bash
python eye2byte.py capture              # Screenshot + analysis
python eye2byte.py capture --voice      # + voice narration
python eye2byte.py capture --mode window # Active window only
python eye2byte_ui.py                    # Launch control panel
```

## Control Panel

```bash
python eye2byte_ui.py
```

A small always-on-top floating panel. Drag it anywhere. Global hotkeys work even when the panel isn't focused.

### Global Hotkeys (Windows)

These work system-wide — no need to focus the Eye2byte window:

| Hotkey | Action | Notes |
|--------|--------|-------|
| `Ctrl+Shift+1` | Capture screenshot | Uses current mode (Full/Window/Region) |
| `Ctrl+Shift+2` | Annotate | Freezes screen, opens drawing overlay |
| `Ctrl+Shift+3` | Toggle voice recording | Press once to start, again to stop |
| `Ctrl+Shift+5` | Grab clipboard image | Analyzes whatever image is on your clipboard |

All keyboard shortcuts are customizable from Settings > Keyboard Shortcuts.

### Panel Controls

| Control | Action |
|---------|--------|
| `Space` (hold) | Push-to-talk — hold to record, release to stop |
| Mode selector | Cycle between Full Screen / Window / Region |
| Settings | Configure provider, model, image quality, cleanup |
| Copy @path | Copy session path to clipboard for `@`-mentioning |

### Annotation Overlay

When you press `Ctrl+Shift+2` or click Annotate, the screen freezes and you can draw on it:

| Key | Tool | How to use |
|-----|------|-----------|
| `X` | Arrow | Click and drag to draw an arrow |
| `C` | Circle | Click and drag to draw an ellipse |
| `V` | Rectangle | Click and drag to draw a box |
| `B` | Freehand | Click and drag to draw freely |
| `T` | Text | Click to place, type your text |

| Action | How |
|--------|-----|
| **Save** | `Enter` (commits annotations and sends to vision model) |
| **Cancel** | `Escape` (discards all annotations) |
| **Undo** | Right-click near an annotation to remove it |
| **Newline in text** | `Shift+Enter` (Enter alone commits the text) |
| **Multi-line text** | Text box auto-grows up to 6 lines |

### Voice Recording

Three ways to record voice:

1. **Toggle** — `Ctrl+Shift+3` starts recording, press again to stop
2. **Push-to-talk** — Hold `Space` while panel is focused
3. **Mouse PTT** — Hold click on the Record button

While recording, any captures you take are automatically bundled with the voice note into a single session.

## MCP Server

Eye2byte exposes 6 tools via the [Model Context Protocol](https://modelcontextprotocol.io), letting coding agents capture and analyze your screen directly.

| Tool | Description |
|------|-------------|
| `capture_and_summarize` | Screenshot + vision analysis. Supports monitor selection, delay, window targeting |
| `capture_with_voice` | Screenshot + voice recording + transcription + analysis |
| `record_clip_and_summarize` | Screen clip with keyframe extraction and sequence analysis |
| `summarize_screenshot` | Analyze an existing image file |
| `transcribe_audio` | Local Whisper transcription of any audio file |
| `get_recent_context` | Retrieve recent Context Pack summaries |

### Local Setup (stdio)

Eye2byte runs on the machine whose screen you want to capture. For local agents like Claude Code on the same machine, use stdio transport:

**Claude Code** — add to your project's `.mcp.json`:

```json
{
  "mcpServers": {
    "eye2byte": {
      "command": "python",
      "args": ["C:/path/to/eye2byte_mcp.py"]
    }
  }
}
```

That's it — Claude Code will auto-start the server. Use full absolute paths.

### Remote Setup (SSE)

When your coding agent runs on a **different machine** (cloud VM, SSH dev box, CI runner) but needs to see your local screen, use SSE transport:

**Step 1 — On your local machine** (the one with the screen):

```bash
# Install Eye2byte + dependencies
pip install Pillow fastmcp
pip install openai-whisper  # optional, for voice

# Start the SSE server
python eye2byte_mcp.py --sse                           # No auth (LAN only)
python eye2byte_mcp.py --sse --token mysecret123       # Bearer token auth
python eye2byte_mcp.py --sse --port 9000 --token abc   # Custom port + auth
```

The server stays running and accepts connections from any machine on your network. Use `--token` when the server is reachable beyond your trusted LAN.

**Step 2 — On the remote machine** (where the coding agent runs):

Nothing to install. Just configure the MCP client to point at your local IP:

```json
{
  "mcpServers": {
    "eye2byte": {
      "url": "http://YOUR_LOCAL_IP:8808/sse",
      "headers": {"Authorization": "Bearer mysecret123"}
    }
  }
}
```

Omit the `headers` field if the server was started without `--token`.

Find your local IP: `ipconfig` (Windows) or `ifconfig` / `ip addr` (Linux/macOS).

**Firewall:** You may need to allow inbound TCP on port 8808. On Windows, run as admin:

```powershell
netsh advfirewall firewall add rule name="Eye2byte MCP" dir=in action=allow protocol=TCP localport=8808
```

### Multi-monitor Examples

```
capture_and_summarize(monitor=0)    # active monitor (default)
capture_and_summarize(monitor=1)    # first monitor
capture_and_summarize(monitor=2)    # second monitor
capture_and_summarize(monitor=-1)   # ALL monitors at once
```

## Context Pack Format

Every analysis produces a structured Context Pack:

```markdown
## Goal         — what the user appears to be doing
## Environment  — OS, editor, repo, branch, language
## Screen State — visible panels, files, terminal output
## Signals      — verbatim errors, stack traces, warnings
## Likely Situation — what's probably happening
## Suggested Next Info — what a coding agent needs next
```

## Configuration

Config: `~/.eye2byte/config.json` (created on first run or via `python eye2byte.py init`)

| Setting | Default | Description |
|---------|---------|-------------|
| `provider` | `"ollama"` | Vision provider: ollama, gemini, openrouter, hyperbolic |
| `model` | `"auto"` | Model name or "auto" for auto-detection |
| `voice_clean` | `true` | Noise removal + pause trimming + volume normalization |
| `auto_cleanup_days` | `7` | Delete old captures/summaries after N days (0=disabled) |
| `image_max_size` | `1920` | Max image dimension before LLM processing |
| `image_quality` | `90` | JPEG quality (1-100) |

## Files

| File | Purpose |
|------|---------|
| `eye2byte.py` | Core engine — capture, voice, clip, summarize, watch |
| `eye2byte_ui.py` | Control panel with hotkeys and annotation overlay |
| `eye2byte_mcp.py` | MCP server for coding agent integration |

## License

MIT
