Metadata-Version: 2.4
Name: extractforms
Version: 0.1.0
Summary: A python project to turn scanned forms into a list of key-value pairs.
Project-URL: Homepage, https://github.com/Guillaume-Lombardo/extractforms
Project-URL: Repository, https://github.com/Guillaume-Lombardo/extractforms
Project-URL: Issues, https://github.com/Guillaume-Lombardo/extractforms/issues
Author-email: Guillaume Lombardo <g1lom@later.day>
License: MIT
License-File: LICENSE
Keywords: automation,cli,package,python
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.13
Requires-Dist: certifi>=2026.1.4
Requires-Dist: httpx>=0.28.1
Requires-Dist: openai>=2.1.0
Requires-Dist: pydantic-settings>=2.13.0
Requires-Dist: pydantic>=2.12.0
Requires-Dist: pymupdf>=1.26.5
Requires-Dist: python-dotenv>=1.1.0
Requires-Dist: structlog>=25.5.0
Description-Content-Type: text/markdown

# ExtractForms

`extractforms` is a Python package and CLI to extract key/value fields from PDF forms.

## Quickstart

```bash
uv sync --group dev
uv run pre-commit install
uv run ruff format .
uv run ruff check .
uv run ty check src tests
uv run pytest
uv run pre-commit run --all-files
```

## CLI

```bash
extractforms extract --input form.pdf --output results/result.json --passes 2
```

Supported options include:
- `--no-cache`
- `--dpi`, `--image-format`, `--page-start`, `--page-end`, `--max-pages`
- `--chunk-pages`
- `--extra-instructions`
- `--schema-id`, `--schema-path`, `--match-schema`

## Environment

Copy `.env.template` to `.env` and configure:
- logging (`LOG_LEVEL`, `LOG_JSON`, `LOG_FILE`)
- enterprise network/TLS (`HTTP_PROXY`, `HTTPS_PROXY`, `ALL_PROXY`, `NO_PROXY`, `CERT_PATH`)
- model endpoint (`OPENAI_BASE_URL`, `OPENAI_API_KEY`, `OPENAI_MODEL`)

## Project Layout

- `src/extractforms`: package code
- `tests/unit`: fast default tests
- `tests/integration`: component-level tests
- `tests/end2end`: user-facing behavior tests
- `skills`: AI helper skills for coding workflows

## Release

1. Bump `version` in `pyproject.toml`.
2. Create and push a git tag: `vX.Y.Z`.
3. GitHub Action publishes to PyPI.

For manual validation, use workflow dispatch with `publish_target=testpypi`.
