Metadata-Version: 2.4
Name: conjure-eval
Version: 0.2.0
Summary: Public-slice harness for the CONJURE transformative-creativity benchmark.
Author-email: Patrick Cooper <patrick.cooper@colorado.edu>
License: Apache-2.0
Keywords: benchmark,lean4,mathlib,llm,creativity
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Provides-Extra: verify

# conjure-eval

Public-slice harness for the CONJURE transformative-creativity benchmark.
Ships the 358-instance public split (70 percent of the 510-instance Phase 4.6
frozen corpus across 17 Lakatos families, SHA-256
`33e9daebbfc1382b08c4b518f6bc9b30e62c13cc9d7e178327675929ebd74cc9`) so
frontier-model developers can self-evaluate locally before submitting to the
hidden split.

This package contains:

- The frozen public-slice corpus JSON (`conjure_eval.data.public_corpus`).
- A CLI for inspecting the corpus, driving a model pass, and checking
  submission files before they are sent to the hidden-split adjudicator.
- The deterministic split provenance, so any third party can re-derive the
  public/hidden split byte-for-byte from the source corpus.

## What this package is and isn't

`conjure-eval` is a self-service developer convenience: it lets a model team
inspect the public contracts, run their model against the public slice, and
smoke-test their submission format before sending results to the benchmark
author. It does not ship the hidden split, and it does not run the
kernel-verified tight-mode adjudicator that produces the headline accept rate.
Those live in the private `blanc` repository and are operated by the benchmark
author against frozen model snapshots; the headline number reported in the
brief is the hidden-split rate.

## Install

```bash
pip install conjure-eval
```

## Usage

```bash
# List all 358 public-slice instance IDs
conjure-eval list-public

# Inspect a single instance
conjure-eval show C1-bv-001

# Drive a model pass (OpenAI-compatible endpoint)
conjure-eval run \
    --base-url https://your-endpoint/v1 \
    --api-key-env MY_API_KEY \
    --model your-model-name \
    --out submissions.jsonl

# Check submission file well-formedness before sending
conjure-eval verify-submission submissions.jsonl

# Print corpus provenance fields
conjure-eval provenance
```

## Provenance

The public corpus is a deterministic 70/30 axis-stratified slice of the
510-instance Phase 4.6 frozen corpus maintained in the private `blanc`
repository. Seed: `4317`. Anyone with the source corpus can reproduce both
slices via `scripts/build_conjure_split.py`.
