Metadata-Version: 2.4
Name: webctl
Version: 0.4.0
Summary: Browser automation via CLI — for humans and agents
Project-URL: Homepage, https://github.com/cosinusalpha/webctl
Project-URL: Repository, https://github.com/cosinusalpha/webctl
Project-URL: Issues, https://github.com/cosinusalpha/webctl/issues
Author-email: cosinusalpha <42695699+cosinusalpha@users.noreply.github.com>
License: MIT
Keywords: agent,automation,browser,cli,web
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Internet :: WWW/HTTP :: Browsers
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.11
Requires-Dist: aiofiles>=24.0.0
Requires-Dist: lark>=1.1.0
Requires-Dist: markitdown>=0.1.0
Requires-Dist: playwright>=1.40.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: rich>=13.0.0
Requires-Dist: typer>=0.12.0
Provides-Extra: dev
Requires-Dist: mypy>=1.10.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest-playwright>=0.5.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.4.0; extra == 'dev'
Requires-Dist: types-aiofiles>=24.0.0; extra == 'dev'
Description-Content-Type: text/markdown

# webctl

**Browser automation for AI agents and humans, built on the command line.**

```bash
pip install webctl
webctl navigate "https://example.com"   # Auto-starts browser, returns page data
webctl click "Sign in"                  # Click by text description
webctl snapshot                         # See all elements with @refs
webctl stop                             # Closes browser and daemon
```

## Why CLI Instead of MCP?

MCP browser tools have a fundamental problem: **the server controls what enters your context**. With Playwright MCP, every response includes the full accessibility tree plus console messages. After a few page queries, your context window is full. This leads to degraded performance, lost context, and higher costs.

CLI flips this around: **you control what enters context**.

```bash
# Filter before context
webctl snapshot --interactive-only --limit 30      # Only buttons, links, inputs
webctl snapshot --within "role=main"               # Skip nav, footer, ads

# Pipe through Unix tools
webctl snapshot | grep -i "submit"                 # Find specific elements
webctl --format jsonl snapshot | jq '.data.role'   # Extract with jq
```

Beyond filtering, CLI gives you:

| Capability         | CLI                           | MCP                    |
|--------------------|-------------------------------|------------------------|
| **Filter output**  | Built-in flags + grep/jq/head | Server decides         |
| **Debug**          | Run same command as agent     | Opaque                 |
| **Cache & Cost**   | `webctl snapshot > cache.txt` | Every call hits server |
| **Script**         | Save to .sh, version control  | Ephemeral              |
| **Human takeover** | Same commands                 | Different interface    |

*See also: [MCP Considered Suboptimal](https://github.com/kb4ai/mcp-considered-suboptimal-pub-kb) — a community knowledge base collecting CLI-over-MCP patterns and alternatives.*

---

## Benchmarks

Head-to-head comparison of **webctl** vs **[agent-browser](https://github.com/vercel-labs/agent-browser)** (Vercel's browser cli) across 4 real-world web tasks. Both tools use Claude Opus as the driving agent.

| Task                    |   webctl   |       |         |           | agent-browser |       |        |       |
|-------------------------|:----------:|:-----:|:-------:|:---------:|:-------------:|:-----:|:------:|:-----:|
|                         |   Score    | Turns | Tokens  |   Cost    |     Score     | Turns | Tokens | Cost  |
| Amazon product lookup   |  **9**/10  |  11   |  119k   |   $0.25   |     9/10      |  18   |  247k  | $0.28 |
| Spiegel.de headlines    |  **9**/10  |   7   |   62k   |   $0.14   |     8/10      |   5   |  47k   | $0.12 |
| Google Maps restaurants |  **8**/10  |   9   |  106k   |   $0.22   |     7/10      |  13   |  185k  | $0.29 |
| DuckDuckGo search       |  **8**/10  |   4   |   29k   |   $0.11   |     4/10      |  17   |  253k  | $0.36 |
| **Average**             | **8.5**/10 | **8** | **79k** | **$0.18** |    7.0/10     |  13   |  183k  | $0.26 |

webctl achieves higher quality scores on all 4 tasks at lower cost. Landmark-aware snapshots collapse navigation/sidebars and prioritize content, while automatic fallbacks (cookie dismiss, scroll-to-find, overlay retry) handle complex sites without extra agent turns.

<details>
<summary>What makes it fast</summary>

- **Structured data first**: `navigate` extracts JSON-LD/Open Graph metadata (price, rating, author, etc.) before touching the accessibility tree — often enough to answer without a full snapshot
- **Landmark-aware filtering**: Collapses nav/footer/sidebar landmarks so agents see content, not chrome
- **Smart network idle**: Custom load detection that ignores media streams and websockets — pages with video/analytics don't block loading
- **Act + observe in one turn**: `--snapshot` flag on click/type returns the updated page state, saving a round-trip

</details>

<details>
<summary>Benchmark details</summary>

**Setup**: Each task runs Claude Opus with a single tool (webctl or agent-browser), a $1 budget cap, and no human intervention. Quality is scored 0–10 by a separate Claude evaluation call.

**Tasks**:
1. **Amazon product lookup**: Find price and shipping for a specific product on amazon.de
2. **Spiegel.de headlines**: Extract top 5 headlines from a German news site
3. **Google Maps restaurants**: Find vegan Chinese restaurants in Berlin rated >4 stars
4. **DuckDuckGo search**: Search for penguin fan sites and return top 3 results

Run benchmarks yourself: `bash benchmarks/bench_run.sh`

</details>

---

## Agent Integration

**Option A: Install the skill** (works across Claude Code, Cursor, Codex, Gemini CLI, Copilot, Goose, Windsurf, and OpenCode)

```bash
npx skills add cosinusalpha/webctl
```

This installs the skill file. Your agent will install the `webctl` package automatically on first use.

**Option B: Install via pip**

```bash
pip install webctl
webctl setup              # Downloads Chromium
webctl init               # Generate skills/prompts for your agents
webctl init --global      # Or install globally (works across all projects)
```

`webctl init` creates on-demand **skills** for Claude Code and Goose, and lean **prompts** for Gemini, Copilot, and Codex.

<details>
<summary>Supported agents and file locations</summary>

| Agent            | Format | Location (project)                | Location (global)                         |
|------------------|--------|-----------------------------------|-------------------------------------------|
| `claude`         | Skill  | `.claude/skills/webctl/SKILL.md`  | `~/.claude/skills/webctl/SKILL.md`        |
| `goose`          | Skill  | `.agents/skills/webctl/SKILL.md`  | `~/.config/agents/skills/webctl/SKILL.md` |
| `gemini`         | Prompt | `GEMINI.md`                       | `~/.gemini/GEMINI.md`                     |
| `copilot`        | Prompt | `.github/copilot-instructions.md` | -                                         |
| `codex`          | Prompt | `AGENTS.md`                       | `~/.codex/AGENTS.md`                      |
| `claude-noskill` | Prompt | `CLAUDE.md` (legacy)              | `~/.claude/CLAUDE.md`                     |

**Why skills?** Skills are loaded on-demand — your agent only reads webctl instructions when actually doing web automation. This keeps your context clean for other tasks.

**Select specific agents:**

```bash
webctl init --agents claude,gemini    # Only Claude and Gemini
webctl init --agents claude-noskill   # Legacy CLAUDE.md format
```

</details>

If your agent doesn't auto-detect the generated files, add this to your system prompt:

> For web browsing, use webctl CLI. Run `webctl agent-prompt` for instructions.

*Note: If a browser MCP is already configured, disable it to avoid conflicts.*

---

## Commands

### Navigation & Observation

```bash
webctl navigate "https://..."                    # Structured data + page summary
webctl navigate "https://..." --snapshot         # Full a11y snapshot with @refs
webctl navigate "https://..." --read             # Readable markdown content
webctl navigate "https://..." --search "query"   # Find search box, type, submit
webctl navigate "https://..." --grep "price"     # Filtered a11y snapshot
webctl back / forward / reload
webctl snapshot --interactive-only               # Buttons, links, inputs only
webctl snapshot --within "role=main"             # Scope to container
webctl query "role=button name~=Submit"          # Debug query
webctl screenshot --path shot.png
```

### Interaction

```bash
webctl click "Submit"                          # By text description
webctl click @e3                               # By @ref from snapshot
webctl click "Submit" --snapshot               # Click + return updated page state
webctl type "Email" "user@example.com"         # Smart targeting
webctl type "Country" "Germany"                # Auto-detects dropdowns
webctl type "Search" "query" --submit          # Type + press Enter
webctl press Enter
webctl do '[[...],[...]]' --snapshot           # Batch multiple actions
```

### Wait Conditions

```bash
webctl wait network-idle
webctl wait 'exists:role=button name~="Continue"'
webctl wait 'url-contains:"/dashboard"'
```

### Session & Console

```bash
webctl status                   # Current state & error counts
webctl save                     # Persist cookies now
webctl console --count          # Just counts by level (LLM-friendly)
webctl console --level error    # Filter to errors only
```

---

## Core Concepts

### Sessions

Browser stays open across commands. Cookies persist to disk.

```bash
webctl start                    # Visible browser
webctl start --mode unattended  # Headless (invisible)
webctl -s work start            # Named profile (separate cookies)
```

### Element Queries

Semantic targeting based on ARIA roles — stable across CSS refactors:

```bash
role=button                     # Any button
role=button name="Submit"       # Exact match
role=button name~="Submit"      # Contains text (preferred)
```

### Output Control

```bash
webctl snapshot                                    # Human-readable
webctl --quiet navigate "..."                      # Suppress events
webctl --result-only --format jsonl navigate "..." # Pure JSON
```

---

## Architecture

```
┌─────────────┐  Unix Socket   ┌─────────────┐
│   CLI       │ ◄────────────► │   Daemon    │
│  (webctl)   │   JSON-RPC     │  (browser)  │
└─────────────┘                └─────────────┘
      │                               │
      ▼                               ▼
  Agent/User                   Chromium + Playwright
```

- **CLI**: Stateless, sends commands to daemon
- **Daemon**: Manages browser, auto-starts on first command
- **Socket**: `$WEBCTL_SOCKET_DIR` or OS default (see below)
- **Profiles**: `~/.local/share/webctl/profiles/`

<details>
<summary>Socket paths</summary>

| Platform | Default |
|----------|---------|
| Linux | `/run/user/<uid>/webctl-<session>.sock` |
| macOS | `/tmp/webctl-<session>.sock` |
| Windows | `%TEMP%\webctl-<session>.sock` |

Override directory with `WEBCTL_SOCKET_DIR` environment variable.

</details>

---

## Security

webctl verifies that CLI commands come from the same user as the daemon:

| Platform | Mechanism | Strength |
|----------|-----------|----------|
| Linux | `SO_PEERCRED` | Kernel-enforced UID check |
| macOS | `LOCAL_PEERCRED` | Kernel-enforced UID check |
| Windows | `SIO_AF_UNIX_GETPEERPID` + process token | Kernel-enforced SID check |

All platforms use kernel-level credential verification. This prevents other users from controlling your browser session.

---

<details>
<summary>Advanced Configuration</summary>

### Custom Browser

Use a custom Chromium binary (skips managed installs):

```bash
webctl config set browser_executable_path /path/to/chrome

# One-off override via environment:
WEBCTL_BROWSER_PATH=/path/to/chrome webctl start
```

Allow global Playwright even if versions mismatch (opt-in, use with care):

```bash
webctl config set use_global_playwright true
```

Clear overrides:

```bash
webctl config set browser_executable_path null
webctl config set use_global_playwright false
```

### Proxy Configuration

Configure HTTP/HTTPS proxy for corporate networks or CI environments.

**Via environment variables** (recommended for CI):

```bash
# Standard proxy env vars (auto-detected)
export HTTPS_PROXY=http://proxy.corp.com:8080
export NO_PROXY=localhost,*.internal.com
webctl start

# Or use webctl-specific var (highest priority)
export WEBCTL_PROXY_SERVER=http://proxy.corp.com:8080
```

**Via config file** (persistent):

```bash
webctl config set proxy_server http://proxy.corp.com:8080
webctl config set proxy_bypass localhost,*.internal.com

# For authenticated proxies
webctl config set proxy_username myuser
webctl config set proxy_password mypass
```

**Priority order**: `WEBCTL_PROXY_SERVER` > `HTTPS_PROXY` > `HTTP_PROXY` > config file

Check and clear settings:

```bash
webctl config show              # View all settings
webctl config set proxy_server null   # Clear proxy
```

### Command Logging

Record all webctl commands and their output in shell-transcript format:

```bash
export WEBCTL_LOG=/tmp/webctl.log
webctl navigate "https://example.com"
webctl click "Submit"
cat /tmp/webctl.log   # Review transcript
```

Each command is logged with a `$ ` prefix followed by its output, appended to the file.

### Domain Policy

Restrict which domains the browser can navigate to. Edit your config file directly
(`webctl config show` to find the path):

```json
{
  "domain_policy": {
    "enabled": true,
    "policy": {
      "mode": "allow",
      "allow": ["localhost", "*.mycompany.com", "github.com"],
      "deny": []
    }
  }
}
```

**Modes:**

| Mode | Behavior |
|------|----------|
| `allow` | Whitelist — only listed domains are permitted |
| `deny` | Blacklist — all domains except listed ones are permitted |
| `both` | Allow list checked first, then deny list |

Domain patterns support glob wildcards (e.g., `*.example.com`). A built-in default deny list
blocks known malicious patterns regardless of mode.

### Container Deployment

Set `WEBCTL_SOCKET_DIR` to share the Unix socket between host and container (or between containers).

**Daemon in Container, Client on Host:**

```bash
mkdir -p /tmp/webctl-ipc

docker run -d --name webctl-daemon \
  -u $(id -u):$(id -g) \
  -v /tmp/webctl-ipc:/ipc \
  -e WEBCTL_SOCKET_DIR=/ipc \
  my-webctl-image python -m webctl.daemon.server

export WEBCTL_SOCKET_DIR=/tmp/webctl-ipc
webctl start && webctl navigate "https://example.com"
```

`-u $(id -u):$(id -g)` ensures the socket file is owned by your host user.

**Daemon and Client in Separate Containers:**

```bash
docker volume create webctl-ipc

docker run -d --name webctl-daemon \
  -v webctl-ipc:/ipc \
  -e WEBCTL_SOCKET_DIR=/ipc \
  my-webctl-image python -m webctl.daemon.server

docker run --rm \
  -v webctl-ipc:/ipc \
  -e WEBCTL_SOCKET_DIR=/ipc \
  my-webctl-image webctl navigate "https://example.com"
```

No UID matching needed — both containers run as the same user.

</details>

<details>
<summary>Global installation with uv</summary>

```bash
uv tool install webctl
uv tool run webctl
```

</details>

<details>
<summary>Linux system dependencies</summary>

```bash
playwright install-deps chromium
# Or manually install libraries listed in Playwright documentation
```

</details>

---

## License

MIT
