Metadata-Version: 2.4
Name: analog-fetcher
Version: 0.2.0
Summary: JS-aware HTML fetcher for Analog that probes-then-renders only when needed. Uses Playwright.
Project-URL: Homepage, https://getanalog.io
Author-email: Marcus Campbell <marcus@getanalog.io>
License: MIT
License-File: LICENSE
Requires-Python: >=3.10
Requires-Dist: analog-sdk<0.2.0,>=0.1.2
Requires-Dist: playwright>=1.45
Requires-Dist: pydantic>=2.6
Description-Content-Type: text/markdown

# analog-fetcher

JS-aware fetching for [Analog](https://getanalog.io). Two fetchers,
both implementing the `analog.fetcher.Fetcher` protocol:

- **`AnalogFetcher`** — the recommended default. Does a cheap HTTP
  fetch first, analyzes the HTML to decide if rendering is actually
  needed, and only escalates to a real browser when it is. Caches the
  decision per-domain so subsequent fetches skip the probe.
- **`PlaywrightFetcher`** — direct Playwright access. Use when you
  know you want JS rendering on every fetch and don't want the probe
  overhead.

When `analog-fetcher` is installed alongside `analog-sdk`,
`AnalogFetcher` becomes the SDK's default automatically — no code
change required on the caller side.

## Installation

```bash
pip install "analog-sdk[fetcher]"   # installs analog-sdk + analog-fetcher
# or
pip install analog-fetcher           # standalone (pulls analog-sdk too)
```

Then install the browser binaries Playwright uses:

```bash
playwright install chromium
```

## Usage

The simplest path — let the SDK use the smart default:

```python
from analog import analog

# AnalogFetcher is the default when analog-fetcher is installed.
result = analog("https://example.com")
```

Or construct one explicitly with custom behavior:

```python
from analog import analog
from analog_fetcher import AnalogFetcher, PlaywrightFetcher

# Customize the rich fetcher used during escalation.
fetcher = AnalogFetcher(
    rich=PlaywrightFetcher(wait_for=".loaded", headless=True),
)
result = analog("https://example.com", fetcher=fetcher)
```

Or force Playwright on every fetch:

```python
from analog_fetcher import PlaywrightFetcher

with PlaywrightFetcher(headless=True) as fetcher:
    r1 = analog("https://example.com/a", fetcher=fetcher)
    r2 = analog("https://example.com/b", fetcher=fetcher)
    # Browser launches once, is reused across both fetches.
```

See https://getanalog.io for full documentation.

## License

MIT — see [LICENSE](./LICENSE).
