Metadata-Version: 2.4
Name: inferencebench-vision
Version: 0.0.2
Summary: Vision-language understanding plugin for InferenceBench Suite (multimodal accuracy on bundled image+question fixtures)
Project-URL: Homepage, https://github.com/yobitelcomm/bench
Author-email: Yobitel Communications <bench@yobitel.com>
License: Apache-2.0
Keywords: ai,benchmark,ml,multimodal,ocr,vision,vlm
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.12
Requires-Dist: inferencebench-envelope
Requires-Dist: inferencebench-harness
Requires-Dist: pillow~=11.0
Requires-Dist: pydantic~=2.9
Requires-Dist: pyyaml~=6.0
Description-Content-Type: text/markdown

# inferencebench-vision

Vision-language understanding plugin for the InferenceBench Suite.

Scores vision-language model answers against bundled image+question fixtures
using deterministic exact-match, substring-match, or LLM-as-judge strategies.
Mirrors the `llm.quality` plugin contract but exercises the multimodal
chat-completions request shape that every modern VLM endpoint (vLLM, SGLang,
OpenAI, Anthropic) accepts.

Suite ID: `vision.understanding`

## Multimodal request shape

Each fixture row pairs an image with a natural-language question. The plugin
constructs an OpenAI-compatible chat-completions request with image content
inline as a base64 data URL:

```json
{
  "messages": [{
    "role": "user",
    "content": [
      {"type": "text", "text": "How many bars are in this chart?"},
      {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
    ]
  }]
}
```

vLLM, SGLang, the OpenAI Chat Completions API and Anthropic's messages API
all accept this exact shape, so a single plugin works against any of them.

## Bundled benchmarks

- `vision.understanding.ocr-mini` — 5 short OCR-style read-text-from-image
  tasks against synthetic PNGs, substring-match scoring.
- `vision.understanding.chart-qa-mini` — 5 ChartQA-style numeric-extraction
  tasks against synthetic bar charts, exact-match scoring.
