Metadata-Version: 2.4
Name: argus-testing
Version: 0.3.0
Summary: AI-powered exploratory testing agent that discovers bugs like a real user
Project-URL: Homepage, https://github.com/chriswu727/argus
Project-URL: Repository, https://github.com/chriswu727/argus
Author-email: Yichen Wu <yichenwujob@gmail.com>
License: MIT
License-File: LICENSE
Keywords: agent,ai,claude-code,exploratory-testing,mcp,playwright,qa,testing
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.10
Requires-Dist: click>=8.0.0
Requires-Dist: jinja2>=3.0.0
Requires-Dist: litellm>=1.40.0
Requires-Dist: mcp[cli]>=1.0.0
Requires-Dist: playwright>=1.40.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.0.0
Description-Content-Type: text/markdown

# Argus

**AI-powered exploratory QA agent.** Give it a URL, it explores your app like a real user — clicking buttons, filling forms, trying edge cases — and finds bugs that scripted tests miss.

Unlike Playwright or Cypress, you don't write test scripts. Argus **discovers bugs you didn't think to test for.**

## What It Catches

Argus runs **16 types of detection** across every page it visits:

| Category | What it finds |
|----------|--------------|
| **Runtime Errors** | Console exceptions, HTTP 4xx/5xx, crashes |
| **Logic Bugs** | Fake delete/edit (says "Saved!" but data didn't persist), misleading success toasts |
| **Data Issues** | Count mismatches, broken dates ("1.52 days ago"), NaN, eternal "Loading..." |
| **Dead Links** | Crawls all internal links, finds 404s and 5xx |
| **Broken Images** | Images that failed to load |
| **SEO** | Missing meta description, OG tags, heading hierarchy issues |
| **Accessibility** | Missing alt text, unlabeled form inputs, no lang attribute |
| **Performance** | Slow page loads (>3s), large resources (>500KB), too many requests |
| **Security** | Mixed content (HTTP resources on HTTPS pages), XSS reflection |

## Quick Start (MCP Server for Claude Code)

The recommended way to use Argus. Claude Code becomes the AI brain — no API key needed.

```bash
pip install argus-testing
playwright install chromium
claude mcp add argus -- argus-mcp
```

Then in Claude Code:

> "Test my app at http://localhost:3000, focus on the checkout flow"

Claude Code will explore your app, try edge cases, verify that actions persist, and generate an HTML bug report.

### MCP Tools (15)

| Tool | What it does |
|------|-------------|
| `start_session(url)` | Launch browser, navigate to URL |
| `get_page_state()` | See interactive elements + page text + counts + toasts + meta tags + a11y issues |
| `click(index)` | Click an element |
| `type_text(index, text)` | Type into an input field |
| `select_option(index, value)` | Select from dropdown |
| `navigate(url)` | Go to a URL |
| `go_back()` | Browser back |
| `scroll_down()` | Scroll the page |
| `screenshot(name)` | Capture the current page |
| `get_errors()` | Run all passive detectors (console, network, text, images, SEO, a11y, mixed content...) |
| `verify_action(type, text, url)` | Verify a delete/edit actually persisted |
| `check_links()` | Crawl all internal links, find dead ones |
| `check_performance()` | Measure load time, find large resources |
| `crawl_site(max_pages)` | **Auto-crawl entire site**: visit all internal pages, run all detectors, one command |
| `end_session()` | Close browser, generate HTML report |

## Alternative: Standalone CLI

Bring your own LLM API key. Argus has a built-in AI planner that decides what to explore.

```bash
pip install argus-testing
playwright install chromium

export DEEPSEEK_API_KEY=sk-...   # or OPENAI_API_KEY, ANTHROPIC_API_KEY

argus http://localhost:3000 --model deepseek/deepseek-chat -n 50
argus http://localhost:3000 -f "test login with edge cases" --headed
```

Supports 100+ models via [LiteLLM](https://github.com/BerriAI/litellm): OpenAI, Anthropic, DeepSeek, Gemini, Ollama (free/local), etc.

## Tested on Real Sites

| Site | Bugs found | Examples |
|------|-----------|----------|
| vanlifeyvr.com | 1 | Missing og:image |
| nalifex.com | 3 | Unlabeled search input, 1.5MB uncompressed image, missing og:image |
| BuggyTasks (test app) | 15+ | Fake delete, fake edit, broken dates, count mismatch, XSS, auth bypass |

## Bug Report

Each session generates a self-contained HTML report with:

- Bug cards with severity, type, description, and reproduction steps
- Embedded screenshots (base64 — no external files needed)
- Testing timeline showing every page visited
- Console and network error logs

## How It Works

```
You give a URL
  -> Argus opens a real browser (Playwright)
  -> AI explores: clicks, types, navigates, tries edge cases
  -> 12 passive detectors analyze every page automatically
  -> On-demand: link crawling + performance metrics
  -> Generates HTML report with all bugs found
```

## Requirements

- Python 3.10+
- Chromium (auto-installed via `playwright install chromium`)

## License

MIT
