Metadata-Version: 2.3
Name: autoarena
Version: 0.1.0b5
Author-email: Kolena Engineering <eng@kolena.com>
License-Expression: Apache-2.0
License-File: LICENSE
Requires-Python: <4,>=3.9
Requires-Dist: anthropic<1
Requires-Dist: cohere<6,>=5
Requires-Dist: duckdb
Requires-Dist: fastapi<1
Requires-Dist: google-generativeai<1
Requires-Dist: loguru<1
Requires-Dist: ollama<1
Requires-Dist: openai<2,>=1
Requires-Dist: pandas<3,>=2
Requires-Dist: pyarrow
Requires-Dist: python-multipart
Requires-Dist: tenacity<10,>=9
Requires-Dist: together<2,>=1
Requires-Dist: torch<3,>=2
Requires-Dist: tqdm<5,>=4
Requires-Dist: transformers<5,>=4
Requires-Dist: uvicorn<1
Provides-Extra: dev
Requires-Dist: pre-commit<4,>=3; extra == 'dev'
Requires-Dist: pytest-cov<6,>=5; extra == 'dev'
Requires-Dist: pytest<9,>=8; extra == 'dev'
Requires-Dist: twine<6,>=5; extra == 'dev'
Description-Content-Type: text/markdown

# AutoArena

AutoArena helps you stack rank LLM outputs against one another using automated judge evaluation.

Install from [PyPI](https://pypi.org/project/autoarena/) and run with:

```
pip install autoarena
python -m autoarena
```

## Usage

Getting started with AutoArena is simple:

1. Run AutoArena via `python -m autoarena` and visit [localhost:8899](http://localhost:8899/) in your browser.
2. Create a project via the UI.
3. Add responses from a model by selecting a CSV file with `prompt` and `response` columns.
4. Configure an automated judge via the UI. Note that most judges require credentials, e.g. `X_API_KEY` in the
   environment where you're running AutoArena.
5. Add responses from a second model to kick off an automated judging task using the judges you configured in the
   previous step to decide which of the models you've uploaded provided a better `response` to a given `prompt`.

That's it! After these steps you're fully set up for automated evaluation on AutoArena.

### Data Storage

Data is stored in `./data/<project>.duckdb` files in the directory where you invoked AutoArena. See
[`data/README.md`](./data/README.md) for more details on data storage in AutoArena.

## Development

AutoArena uses [uv](https://github.com/astral-sh/uv) to manage dependencies. To set up this repository for development,
run:

```shell
uv venv && source .venv/bin/activate
uv pip install --all-extras -r pyproject.toml
uv tool run pre-commit install
uv run python3 -m autoarena --dev
```

To run AutoArena for development, you will need to run both the backend and frontend service:

- Backend: `uv run python3 -m autoarena --dev` (the `--dev`/`-d` flag enables automatic service reloading when
    source files change)
- Frontend: see [`ui/README.md`](./ui/README.md)

To build a release tarball in the `./dist` directory:

```
./scripts/build.sh
```
