Metadata-Version: 2.4
Name: citeformer
Version: 0.2.0
Summary: A bulletproof way to generate verifiably cited text from language models — structurally unforgeable citation markers via constrained decoding.
Project-URL: Homepage, https://citeformer.readthedocs.io
Project-URL: Repository, https://github.com/random-walks/citeformer
Project-URL: Documentation, https://citeformer.readthedocs.io
Project-URL: Issues, https://github.com/random-walks/citeformer/issues
Project-URL: Changelog, https://github.com/random-walks/citeformer/blob/main/CHANGELOG.md
Author-email: Blaise Albis-Burdige <blaise@ubik.studio>
License-Expression: Apache-2.0
License-File: AUTHORS.md
License-File: LICENSE
Keywords: agents,citations,constrained-decoding,csl,hallucination,llm,nli,rag,structured-output
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.11
Requires-Dist: diskcache>=5
Requires-Dist: httpx>=0.27
Requires-Dist: lark>=1.2
Requires-Dist: lxml>=5
Requires-Dist: pydantic>=2
Requires-Dist: pypdf>=5
Requires-Dist: readability-lxml>=0.8.1
Requires-Dist: rich>=13
Requires-Dist: typer>=0.12
Provides-Extra: all
Requires-Dist: accelerate>=1.0; extra == 'all'
Requires-Dist: anthropic>=0.40; extra == 'all'
Requires-Dist: google-genai>=0.7; extra == 'all'
Requires-Dist: llama-cpp-python>=0.3; extra == 'all'
Requires-Dist: llguidance>=0.5; extra == 'all'
Requires-Dist: mistralai>=2.0; extra == 'all'
Requires-Dist: openai>=1.40; extra == 'all'
Requires-Dist: sentence-transformers>=3; extra == 'all'
Requires-Dist: torch>=2.8; extra == 'all'
Requires-Dist: transformers>=4.46; extra == 'all'
Requires-Dist: xgrammar>=0.1.30; extra == 'all'
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.40; extra == 'anthropic'
Provides-Extra: dev
Requires-Dist: furo>=2024.8; extra == 'dev'
Requires-Dist: hatch; extra == 'dev'
Requires-Dist: hypothesis>=6; extra == 'dev'
Requires-Dist: matplotlib>=3.8; extra == 'dev'
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: myst-nb>=1.1; extra == 'dev'
Requires-Dist: myst-parser[linkify]>=3; extra == 'dev'
Requires-Dist: pre-commit>=3; extra == 'dev'
Requires-Dist: pytest-asyncio; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: pytest-regressions>=2; extra == 'dev'
Requires-Dist: pytest-vcr; extra == 'dev'
Requires-Dist: pytest>=8; extra == 'dev'
Requires-Dist: ruff>=0.6; extra == 'dev'
Requires-Dist: sphinx-autobuild>=2024.10; extra == 'dev'
Requires-Dist: sphinx-autodoc2>=0.5; extra == 'dev'
Requires-Dist: sphinx-copybutton>=0.5; extra == 'dev'
Requires-Dist: sphinx-design>=0.6; extra == 'dev'
Requires-Dist: sphinx-llms-txt>=0.7; extra == 'dev'
Requires-Dist: sphinx>=7; extra == 'dev'
Requires-Dist: sphinxcontrib-typer>=0.5; extra == 'dev'
Provides-Extra: docs
Requires-Dist: furo>=2024.8; extra == 'docs'
Requires-Dist: myst-nb>=1.1; extra == 'docs'
Requires-Dist: myst-parser[linkify]>=3; extra == 'docs'
Requires-Dist: sphinx-autobuild>=2024.10; extra == 'docs'
Requires-Dist: sphinx-autodoc2>=0.5; extra == 'docs'
Requires-Dist: sphinx-copybutton>=0.5; extra == 'docs'
Requires-Dist: sphinx-design>=0.6; extra == 'docs'
Requires-Dist: sphinx-llms-txt>=0.7; extra == 'docs'
Requires-Dist: sphinx>=7; extra == 'docs'
Requires-Dist: sphinxcontrib-typer>=0.5; extra == 'docs'
Provides-Extra: examples
Requires-Dist: accelerate>=1.0; extra == 'examples'
Requires-Dist: jupyter>=1; extra == 'examples'
Requires-Dist: llguidance>=0.5; extra == 'examples'
Requires-Dist: matplotlib>=3.8; extra == 'examples'
Requires-Dist: torch>=2.8; extra == 'examples'
Requires-Dist: transformers>=4.46; extra == 'examples'
Requires-Dist: xgrammar>=0.1.30; extra == 'examples'
Provides-Extra: gemini
Requires-Dist: google-genai>=0.7; extra == 'gemini'
Provides-Extra: grobid
Requires-Dist: grobid-client-python>=0.0.9; extra == 'grobid'
Provides-Extra: hf
Requires-Dist: accelerate>=1.0; extra == 'hf'
Requires-Dist: llguidance>=0.5; extra == 'hf'
Requires-Dist: torch>=2.8; extra == 'hf'
Requires-Dist: transformers>=4.46; extra == 'hf'
Requires-Dist: xgrammar>=0.1.30; extra == 'hf'
Provides-Extra: llamacpp
Requires-Dist: llama-cpp-python>=0.3; extra == 'llamacpp'
Provides-Extra: mistral
Requires-Dist: mistralai>=2.0; extra == 'mistral'
Provides-Extra: openai
Requires-Dist: openai>=1.40; extra == 'openai'
Provides-Extra: verify
Requires-Dist: sentence-transformers>=3; extra == 'verify'
Requires-Dist: torch>=2.8; extra == 'verify'
Requires-Dist: transformers>=4.46; extra == 'verify'
Provides-Extra: vllm
Requires-Dist: vllm>=0.7; extra == 'vllm'
Description-Content-Type: text/markdown

# citeformer

[![PyPI](https://img.shields.io/pypi/v/citeformer?color=blue)](https://pypi.org/project/citeformer/)
[![Docs](https://readthedocs.org/projects/citeformer/badge/?version=latest)](https://citeformer.readthedocs.io/en/latest/)
[![License](https://img.shields.io/badge/license-Apache--2.0-green)](LICENSE)
[![Python](https://img.shields.io/pypi/pyversions/citeformer)](https://pypi.org/project/citeformer/)
[![CI](https://github.com/random-walks/citeformer/actions/workflows/ci.yml/badge.svg)](https://github.com/random-walks/citeformer/actions/workflows/ci.yml)
[![codecov](https://codecov.io/gh/random-walks/citeformer/branch/main/graph/badge.svg)](https://codecov.io/gh/random-walks/citeformer)

***A bulletproof way to generate verifiably cited text from language models.***

![Side-by-side: baseline HF generation happily emits [7] and [8] when only 6 sources are in scope; citeformer's grammar mask makes [7]/[8] token-impossible to sample](benchmarks/findings/figures/cover-annotated.png)

### What it does — one paragraph for everyone

Language models hallucinate citations. Ask GPT-4, Claude, or an open-source model to cite "source [7]" when you only gave it six sources and a solid chunk of the time it will invent `[7]`, `[8]`, sometimes `[42]`. citeformer makes that **physically impossible**. Before the model picks its next token, we compile a tiny grammar that only admits citation markers pointing at sources you actually supplied, and we hand that grammar to the decoder. Fabricated citations don't get generated less often — they cannot be generated at all. Bibliographies are rendered deterministically by the library in six academic styles (APA, MLA, Chicago, IEEE, Nature, Vancouver), and every emitted claim can be NLI-verified against its cited source after the fact. [Try the live demo](https://huggingface.co/spaces/random-walks/citeformer-demo) or `pip install citeformer`.

### What makes it interesting — for the applied-AI crowd

> If you've read the jsonformer source or thought about logit-layer structured output, skip to [Backends](#backends).

- **Logit-masked GBNF.** The `cite-id` terminal is compiled per call to `"[" ("1" | "2" | ... | "N") "]"` and handed to [XGrammar](https://github.com/mlc-ai/xgrammar) (default) or [llguidance](https://github.com/guidance-ai/llguidance). Out-of-scope tokens get masked to zero probability before sampling — the sampler *never sees them*. This is structural, not rejection-sampled.
- **Seven backends, three enforcement tiers, one `GenerationResult`.** HF + vLLM + llama.cpp enforce at the logit layer. OpenAI + Gemini + Mistral enforce at the schema layer (`enum`-bounded `citations` in `strict=true` JSON schema — server-side rejection of non-conforming payloads). Anthropic is adapted from its native Citations API. All collapse to the same typed output for downstream verify / render / streaming.
- **The model never touches the bibliography.** Six hand-written CSL formatters (~1 kLOC, no citeproc-py dependency — see [ADR-004](docs/decisions/004-citeproc-rewrite.md)) render references deterministically. 300 locked snapshots pin the formatter outputs.
- **Verify is real, not a hit rate.** `result.verify()` runs DeBERTa-v3-large-MNLI over every (source content, cited sentence) pair and returns a typed `VerificationReport` — with a coverage check for uncited-but-entailed sentences. Threshold calibration + the honest bimodal-score finding live in [benchmarks/README.md#finding-4](benchmarks/README.md#finding-4--nli-threshold-calibration-deberta-v3-large-is-bimodal).
- **0.0 ± 0.0 fabrication across 40 runs.** 4 prompt shapes × 2 models × 5 seeds in [`benchmarks/multiprompt_sweep.py`](benchmarks/multiprompt_sweep.py). The stds are identically zero because there's no variance to measure — the guarantee is a contract, not a mean.

### Hi, I'm [Blaise](https://blaiseab.com) — how this got built

Hi — I'm [Blaise Albis-Burdige](https://blaiseab.com) ([@blaiseab](https://github.com/blaiseab)). I wrote citeformer on and immediately around a trip to [Ramp's](https://ramp.com) NYC office. On the subway ride up I was rereading [jsonformer](https://github.com/1rgs/jsonformer) by [Nick Kapur](https://github.com/1rgs) — partly to sharpen my intuition for how the applied-AI folks at Ramp think about structured output, partly because jsonformer is one of those projects whose core insight ("don't prompt it; constrain the token distribution") has aged extraordinarily well. By the time I got off the train I was convinced the same move applied to RAG citations, which are — empirically, in 2026 benchmarks — wrong [14–95% of the time depending on what you measure](https://arxiv.org/search/?query=RAG+citation+fabrication&searchtype=all). jsonformer has been dormant since early 2024; no successor had applied the insight to citation markers. This is that successor. The heavy lifting lives in dependencies I didn't write (XGrammar, transformers, vLLM, DeBERTa, httpx, pypdf, GROBID, readability) — citeformer's contribution is the composition plus the six §10 contracts that keep the seams honest as the surface grows. Paper-shaped write-up: [PREPRINT.md](PREPRINT.md).

> **Status**: v0.1.0 on [PyPI](https://pypi.org/project/citeformer/). Seven backends (HF + vLLM + llama.cpp local, OpenAI + Anthropic + Gemini + Mistral API), six hand-written CSL styles, deterministic bibliography rendering, and claim-level NLI verification. Follow [CHANGELOG.md](CHANGELOG.md) for the full change log.

## Why structural, not statistical

LLM-generated citations are wrong 14–95% of the time depending on the benchmark. RAG systems still fabricate 3–13% of cited URLs. NeurIPS 2025 accepted ~50 papers with AI-generated fake references. Prompting doesn't fix it; post-hoc verification doesn't fix it. The only real fix is **structural** — make the invalid output token-impossible before the model reaches the decision point.

citeformer delivers that in three independent ways:

- **Citation markers can't be fabricated.** `[N]` where `N > len(sources)` is token-impossible to sample on local backends, and schema-rejected on the API tier. Proven across [40 multi-prompt runs](benchmarks/README.md#finding-5--multi-prompt-sweep-structural-guarantee-is-prompt-invariant) — **0% fabrication on every prompt × model × seed triple**.
- **Bibliographies are rendered by the library, not the model.** Six styles, deterministic output, [300 locked snapshots](tests/unit/test_csl_suite/).
- **Every citation is claim-verifiable.** `result.verify()` runs NLI entailment per cite and returns a structured `VerificationReport` — not just a hit rate.

## Install

```bash
# Core only — no model backend, just the types + rendering + metadata adapters.
pip install citeformer

# Local backends (logit-tier enforcement).
pip install 'citeformer[hf]'             # HuggingFace transformers + XGrammar
pip install 'citeformer[llamacpp]'       # llama.cpp native GBNF
pip install 'citeformer[vllm]'           # vLLM guided-decoding (Linux/CUDA only)

# API backends (schema-tier enforcement).
pip install 'citeformer[openai]'         # Structured Outputs strict=true
pip install 'citeformer[anthropic]'      # Citations API adapter

# NLI verification (DeBERTa-v3-MNLI).
pip install 'citeformer[verify]'

# Cross-platform kitchen sink (HF + llama.cpp + verify; excludes vLLM).
pip install 'citeformer[all]'
```

Python 3.11+ (tested through 3.14). Apache-2.0.

**Try it without installing.** The [HF Space demo](hf-space/) runs the adversarial "100% → 0% fabrication" swing on CPU in your browser. The [literature-review notebook](examples/08_literature_review.ipynb) walks end-to-end from arXiv fetch → grammar-constrained generation → NLI verification → APA-7 bibliography on a laptop-friendly 500 MB model.

## Quickstart

```python
from citeformer import Citeformer, Policy, Source
from citeformer.backends.hf import HFBackend

sources = [
    Source.from_doi("10.1038/s41586-023-06221-2"),
    Source.from_arxiv("2305.14627"),
    Source(
        metadata={
            "id": "poe-raven",
            "type": "book",
            "title": "The Raven",
            "author": [{"family": "Poe", "given": "Edgar Allan"}],
            "issued": {"date-parts": [[1845]]},
        },
        content="Once upon a midnight dreary...",
    ),
]

cf = Citeformer(
    backend=HFBackend(model="microsoft/Phi-3.5-mini-instruct"),
    style="apa-7",
    citation_policy=Policy.REQUIRED,
)
result = cf.generate(prompt="Summarize the three works.", sources=sources)

print(result.text)               # "Poe's The Raven opens... [3] BERT introduced... [2]"
for ref in result.references:
    print(ref.rendered)          # APA-7, rendered by the formatter — not the LLM

report = result.verify()         # NLI entailment per citation
print(f"{report.support_rate:.0%} of cites entailed by their source")
```

`result.text` cannot contain `[4]`. Not "unlikely to"; *cannot*, by grammar construction. Try more backends, styles, or the API tier with `from citeformer.backends.openai import OpenAIBackend` / `anthropic import AnthropicBackend`.

## Backends

Seven backends, three enforcement tiers, one `Backend` ABC:

| Backend            | Extra      | Enforcement tier   | Where it lives               | Notes |
|--------------------|------------|--------------------|------------------------------|-------|
| `HFBackend`        | `hf`       | **Logit (XGrammar)** | `citeformer.backends.hf`    | Flagship. Grammar-level token masking. |
| `LlamaCppBackend`  | `llamacpp` | **Logit (GBNF)**     | `citeformer.backends.llamacpp` | Native GBNF via `llama-cpp-python`. CPU + Metal + CUDA. |
| `VLLMBackend`      | `vllm`     | **Logit (XGrammar/llguidance)** | `citeformer.backends.vllm` | vLLM guided decoding. Linux/CUDA only. |
| `OpenAIBackend`    | `openai`   | **Schema (strict JSON)** | `citeformer.backends.openai` | OpenAI Structured Outputs — live verified. |
| `AnthropicBackend` | `anthropic`| **Provider-native** | `citeformer.backends.anthropic` | Adapter over Anthropic's Citations API — live verified. |
| `GeminiBackend`    | `gemini`   | **Schema (response_schema)** | `citeformer.backends.gemini` | Gemini's OpenAPI-subset structured output. |
| `MistralBackend`   | `mistral`  | **Schema (strict JSON)** | `citeformer.backends.mistral` | Mistral's `response_format` strict JSON schema. |
| `MockBackend`      | (core)     | Scripted             | `citeformer.backends.mock`  | For tests. Honors policies + marker styles. |

All produce the same `GenerationResult`, so verify / render / streaming work identically across tiers. OpenAI + Anthropic are live-verified against production endpoints in [`tests/integration/test_api_backends_live.py`](tests/integration/test_api_backends_live.py); Gemini + Mistral ship with fake-client coverage and the same schema contract. Full tier discussion: [architecture.md](docs/reference/architecture.md#tiered-enforcement--local-vs-api).

### API backends (quickstart)

Both API backends are live-tested against production endpoints — see [`tests/integration/test_api_backends_live.py`](tests/integration/test_api_backends_live.py).

```python
from citeformer import Citeformer, Policy, Source
from citeformer.backends.openai import OpenAIBackend       # pip install citeformer[openai]
# from citeformer.backends.anthropic import AnthropicBackend  # pip install citeformer[anthropic]

sources = [Source(metadata={"id": "poe", "type": "book", "title": "The Raven",
                            "author": [{"family": "Poe"}],
                            "issued": {"date-parts": [[1845]]}},
                  content="Once upon a midnight dreary...")]

# OpenAI uses strict JSON-schema mode (gpt-4o-2024-08-06+ only).
# Reads OPENAI_API_KEY from env; pass `client=...` or `api_key=...` to override.
cf = Citeformer(backend=OpenAIBackend(model="gpt-4o-mini"),
                style="apa-7", citation_policy=Policy.REQUIRED)
result = cf.generate(prompt="Describe the opening in one sentence.", sources=sources)
```

Honest about tiers: **logit-layer** (local) backends make out-of-scope citations *token-impossible to sample*. **Schema-layer** (OpenAI) rejects non-conforming payloads server-side — fabrication is structurally impossible in the returned payload. **Provider-native** (Anthropic) trusts the provider's own Citations system. All three collapse to the same `GenerationResult` for downstream verify / render.

## Citation policies

`Policy` controls where citations are grammatically required:

| Policy        | Shape of valid output | When to use |
|---------------|-----------------------|-------------|
| `REQUIRED`    | Every sentence ends `content cite-group sent-end`. Cite or can't close. | Literature reviews, survey papers, anything where every claim needs provenance. |
| `QUOTES_ONLY` | Only `"..."` quoted spans require a trailing `cite-group`. | Mixed analytical prose — narrative is uncited, direct quotations are tracked. |
| `AUTO`        | `cite-group` is allowed anywhere, never required. `verify()` flags uncited-but-entailed sentences post-hoc. | Open-ended generation; NLI coverage check does the policing. |

Pass via `Citeformer(citation_policy=Policy.REQUIRED)` or per-call `cf.generate(..., policy=Policy.AUTO)`. See [`Policy`](src/citeformer/core.py).

## Metadata adapters

Build `Source` objects from real-world inputs:

```python
Source.from_doi("10.1038/s41586-023-06221-2")      # Crossref → CSL-JSON
Source.from_arxiv("2305.14627")                     # arXiv API → CSL-JSON + abstract
Source.from_pdf("paper.pdf")                        # pypdf → title + body text
Source.from_pdf("paper.pdf", extractor="grobid")    # GROBID → author/abstract/section text
Source.from_url("https://example.com/article")      # readability-lxml + OpenGraph

# Bulk-load a library; each returns list[Source].
Source.from_bibtex("refs.bib")                      # BibTeX parser → CSL-JSON
Source.from_zotero("zotero-export.json")            # Zotero CSL JSON / Better BibTeX
```

All fetchers are cached on disk via `diskcache` (`~/.cache/citeformer/metadata/`, override with `CITEFORMER_CACHE_DIR`).

## Inline marker shapes

`[N]` collides with Markdown link syntax. Switch it out with `MarkerStyle`:

```python
from citeformer import MarkerStyle

cf = Citeformer(backend=backend, marker_style=MarkerStyle.PAREN)    # (1), (2) ...
cf = Citeformer(backend=backend, marker_style=MarkerStyle.CURLY)    # {1}, {2} ...
cf = Citeformer(backend=backend, marker_style=MarkerStyle.CARET)    # ^1, ^2 ...
```

The structural guarantee is identical across styles — the grammar's digit enum is bounded by `range(1, len(sources) + 1)` regardless of which delimiters surround it. See [ADR-011](docs/decisions/011-configurable-marker-styles.md).

## Streaming

```python
stream = cf.stream(prompt="...", sources=sources)
for chunk in stream:
    print(chunk, end="", flush=True)
result = stream.finalize()    # full GenerationResult with parsed citations + refs
```

Grammar constraints apply to every chunk. HF and llama.cpp deliver true token-by-token streaming; the API backends chunk on sentence boundaries for UI progression.

## Evidence

All numbers below come from running scripts in [`benchmarks/`](benchmarks/) — reproducible on a commodity laptop with `uv run python -m benchmarks.<script>`.

![Multi-prompt summary](benchmarks/findings/figures/multiprompt-summary.png)

| Finding | Result | Script |
|---------|--------|--------|
| [Adversarial](benchmarks/README.md#finding-1--adversarial-100--0-fabrication) | 100% → 0% fabrication swing when the prompt demands out-of-scope ids | `adversarial.py` |
| [Sweep](benchmarks/README.md#finding-2--sweep-aggregate-0--0-fabrication-across-all-models) | 0 ± 0 fabrication across 13 runs (3 models × up to 5 seeds) | `sweep.py` |
| [Full-text premise](benchmarks/README.md#finding-3--full-text-nli-premise-lifts-support-substantially-but-noisily) | Support rate lifts with full-text NLI premise — but the number is noisy, so we report that honestly | `sweep.py --premise fulltext` |
| [NLI calibration](benchmarks/README.md#finding-4--nli-threshold-calibration-deberta-v3-large-is-bimodal) | DeBERTa-v3-large is bimodal; threshold isn't the right knob | `threshold_calibration.py` |
| [Multi-prompt](benchmarks/README.md#finding-5--multi-prompt-sweep-structural-guarantee-is-prompt-invariant) | 0% fab across 24 runs × 4 prompt shapes — guarantee is prompt-invariant | `multiprompt_sweep.py` |

## Composition, not reinvention

citeformer's value is the **composition**, not the parts. The heavy lifting lives in established dependencies:

| We piggyback on | For |
|---|---|
| **XGrammar** / **llguidance** | Token-level logit masking at generation time |
| **transformers** / **vLLM** / **llama-cpp-python** | Running local models |
| **openai** / **anthropic** SDKs | API-provider generation |
| **lark** | Authoring citation grammars before hand-off to the decoder |
| **pydantic** | Immutable output schemas with `extra="forbid"` |
| **httpx** + **diskcache** | Metadata fetchers (Crossref, arXiv) with caching |
| **pypdf** | PDF text extraction |
| **readability-lxml** | URL extraction |
| **DeBERTa-v3-MNLI** (via transformers) | NLI entailment for `verify()` |
| **typer** + **rich** | CLI + pretty output |

The parts citeformer owns: citation grammar shape ([§10.1](docs/reference/contracts.md)), CSL-JSON source contract ([§10.2](docs/reference/contracts.md)), output pydantic models ([§10.3](docs/reference/contracts.md)), marker-to-reference coupling, the six bundled style formatters (APA 7, MLA 9, Chicago author-date, IEEE, Nature, Vancouver — [ADR-004](docs/decisions/004-citeproc-rewrite.md)), the BibTeX parser, and the orchestration loop. Everything else is a composition.

## Examples

The [`examples/`](examples/) directory contains eight runnable scripts, each a living report:

| # | File | What it shows |
|---|------|---------------|
| 1 | [`01_quickstart_mock.py`](examples/01_quickstart_mock.py) | Shortest possible demo — no ML, no extras |
| 2 | [`02_rag_with_hf_and_verify.py`](examples/02_rag_with_hf_and_verify.py) | Full RAG pipeline with HF + NLI verify |
| 3 | [`03_standalone_rendering.py`](examples/03_standalone_rendering.py) | All six styles on the same CSL-JSON item |
| 4 | [`04_fetch_and_render.py`](examples/04_fetch_and_render.py) | DOI → Crossref → rendered reference |
| 5 | [`05_streaming.py`](examples/05_streaming.py) | Realtime chunk streaming via `cf.stream()` |
| 6 | [`06_langchain_rag.py`](examples/06_langchain_rag.py) | LangChain `Document` → `Source` → citeformer |
| 7 | [`07_llamaindex_rag.py`](examples/07_llamaindex_rag.py) | LlamaIndex `NodeWithScore` → `Source` |
| 8 | [`08_literature_review.ipynb`](examples/08_literature_review.ipynb) | Full academic workflow notebook (arXiv → review → verify → APA-7) |
| 9 | [`09_bibtex_source.py`](examples/09_bibtex_source.py) | BibTeX + Zotero ingest → APA-7 render (no network, no model) |

## Paper-shaped write-up

A longer design + evaluation document is in [`PREPRINT.md`](PREPRINT.md). Eight sections covering motivation, related work, design, structural-guarantee evaluation (40-run sweep), NLI calibration findings (bimodal large vs under-confident base), known limitations, and roadmap.

## Is this for you?

**Probably yes if:**

- You're building RAG and need citations that can't hallucinate.
- You run open-weight models locally (HF / vLLM / llama.cpp) and want grammar-level guarantees.
- You call an API (OpenAI / Anthropic) and want the same `GenerationResult` / `Citation` / `Reference` surface across your providers.
- You need APA / MLA / Chicago / IEEE / Nature / Vancouver bibliographies rendered deterministically.
- You care about claim-level NLI verification out of the box.
- You want to ingest from BibTeX / Zotero / DOI / arXiv / PDF / URL without glue code.

**Probably no if:**

- You want a full agent framework — use LangChain / LlamaIndex and compose citeformer as the generation step ([examples 6 & 7](#examples) show how).
- You need a TypeScript surface today — a sibling `citeformer-ts` may come later; not here yet.
- You need a citation style outside the six bundled — you can plug in `citeproc-py` yourself, or contribute a `CitationFormatter` subclass (see [`.claude/skills/add-citation-format`](.claude/skills/add-citation-format/)).

## Documentation

- **Getting started**: [getting-started](https://citeformer.readthedocs.io/en/stable/getting-started.html)
- **Guarantees**: [guarantees](https://citeformer.readthedocs.io/en/stable/guarantees.html) — what "bulletproof" actually covers.
- **Architecture**: [reference/architecture](https://citeformer.readthedocs.io/en/stable/reference/architecture.html) — layers + phase plan + tiered enforcement.
- **Contracts**: [reference/contracts](https://citeformer.readthedocs.io/en/stable/reference/contracts.html) — the three §10 invariants.
- **ADRs**: [docs/decisions/](docs/decisions/) — 11 short architecture-decision records documenting major design choices.
- **Benchmarks**: [benchmarks/README.md](benchmarks/README.md) — the five findings with reproduction commands.

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md). Short version: bug-fix PRs welcome and bump patch; feature PRs should open an issue first. The three §10 contracts (grammar shape, CSL metadata, output schemas) are deliberate ceremonies — read [docs/reference/contracts.md](docs/reference/contracts.md) before touching them.

## License

Apache-2.0. See [LICENSE](LICENSE).
