Metadata-Version: 2.4
Name: diagram2code
Version: 0.1.8
Summary: Convert simple diagram images into runnable code (matplotlib/graphviz).
Author: Kazi Samiul Islam
License-Expression: MIT
Requires-Python: <3.14,>=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: opencv-python>=4.8
Requires-Dist: numpy>=1.26
Requires-Dist: matplotlib>=3.8
Requires-Dist: networkx>=3.0
Requires-Dist: platformdirs>=4.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: ruff>=0.5.0; extra == "dev"
Provides-Extra: hf
Requires-Dist: huggingface_hub>=0.23; extra == "hf"
Provides-Extra: ocr
Requires-Dist: pytesseract>=0.3.13; extra == "ocr"
Provides-Extra: render
Requires-Dist: matplotlib>=3.8; extra == "render"
Requires-Dist: networkx>=3.0; extra == "render"
Dynamic: license-file

# diagram2code

Convert simple flowchart-style diagrams into runnable Python programs.

`diagram2code` takes a diagram image (rectangular steps + arrows), detects the flow, and generates:

- a graph representation (`graph.json`)
- a runnable Python program (`generated_program.py`)
- optional debug visualizations (`debug_nodes.png`, `debug_arrows.png`)
- an optional exportable bundle (`--export`)

> This project is designed for **learning, prototyping, and experimentation**, not for production-grade diagram parsing. :contentReference[oaicite:1]{index=1}

---

## Table of Contents

1. [Installation](#installation)
2. [Quick Start](#quick-start)
3. [Using Labels](#using-labels)
4. [Export Bundle](#export-bundle)
5. [Generated Files](#generated-files)
6. [Examples](#examples)
7. [Limitations](#limitations)
8. [Datasets, Predictors & Benchmarks](#datasets-predictors--benchmarks)
---

## Installation

Clone the repo and install in editable mode:

```bash
git clone https://github.com/Nimil785477/diagram2code.git
cd diagram2code

python -m venv .venv
```
Activate the environment
```
# Linux / macOS
source .venv/bin/activate

# Windows (PowerShell)
.\.venv\Scripts\Activate.ps1
```
Install:
```
pip install -e .
```
### Basic install from PyPI
```bash
pip install diagram2code
```
With OCR support(optional)
```bash
pip install diagram2code[ocr]
```
You must also install Tesseract OCR on your system:
- Windows: https://github.com/UB-Mannheim/tesseract/wiki
- macOS:
```bash
brew install tesseract
```
- Ubuntu/Debian:
```bash
sudo apt install tesseract-ocr
```
Then run:
```Powershell
diagram2code image.png --extract-labels
```

This matches exactly what your code already does ✔️

---



## Quick Start

Run diagram2code on a simple diagram:
```bash
diagram2code tests/fixtures/branching.png --out outputs
```
This will write outputs (see Generated Files)

## Inspect the detected graph (print summary)

You can inspect the detected nodes, edges, and labels using `--print-graph`.

```bash
diagram2code tests/fixtures/branching.png --out outputs --print-graph
```
This will:
- run the full detection pipeline
- write all normal output files
- print a human-readable graph summary to the console

Example Output:
```
Graph summary
Labels source: none
Nodes: 4
  - id=0 bbox=(40, 40, 76, 76) label=''
Edges: 4
  - 0 -> 1
```
### Dry-run mode
If you only want to inspect the result without writing any files, use:
```
diagram2code diagram.png --dry-run --print-graph
```

In dry-run mode:
- detection still runs fully
- no files are written
- OCR does not write labels.json
- export bundles are not created


## Using Labels
You can provide custom labels for nodes using a JSON file

Example labels.json
```
{
  "0": "Step_1_Load_Data",
  "1": "Step_2_Train_Model"
}
```
Run with labels
```
python -m diagram2code.cli diagram.png --out outputs --labels labels.json
```
The exported program will then use labeled function names (sanitized into valid Python identifiers).

### Label resolution order (important)

When multiple label sources are possible, `diagram2code` resolves labels in the following priority order:

1. **Explicit labels file**
   ```bash
   diagram2code diagram.png --labels labels.json
   ```
2. **Auto-detect `labels.json` inside export directory**
   ```bash
   diagram2code diagram.png --export export_out
   ```
   If `export_out/labels.json` exists, it is automatically loaded.
3. **OCR extraction**
   ```bash
   diagram2code diagram.png --extract-labels
   ```
4. **Fallback**
   - If none of the above are provided, nodes have empty label
   The active source is shown when using --print-graph:
   ```bash
   Labels source: auto (export_out/labels.json)
   ```

### Generate a labels template (no OCR)
If you want to label nodes manually, generate a template file:

```bash
diagram2code path/to/diagram.png --out outputs --labels-template
```

## Export Bundle
The **--export** flag creates a self-contained runnable bundle(easy to share). If `labels.json` exists inside the export directory, it will be automatically used on subsequent runs.

```
python -m diagram2code.cli diagram.png --out outputs --export export_bundle
```

When using --export, the following files are copied:
```
export_bundle/
├── generated_program.py
├── graph.json
├── labels.json            (if provided)
├── debug_nodes.png        (if exists)
├── debug_arrows.png       (if exists)
├── render_graph.py        (if exists)
├── run.ps1
├── run.sh
└── README_EXPORT.md
```
Running the exported bundle

Windows (PowerShell):
```
cd export_bundle
.\run.ps1
```
Linux/macOS:
```
cd export_bundle
bash run.sh
```
or directly:
```
python generated_program.py
```

## Generated Files
After a normal run **(--out outputs)**:
| File                   | Description                          |
| ---------------------- | ------------------------------------ |
| `preprocessed.png`     | Binary image used for detection      |
| `debug_nodes.png`      | Detected rectangles overlay          |
| `debug_arrows.png`     | Detected arrows overlay (if enabled) |
| `graph.json`           | Graph structure (nodes + edges)      |
| `render_graph.py`      | Script to visualize the graph        |
| `generated_program.py` | Generated executable Python program  |

## Examples

### CLI Usage Examples
Basic run (writes outputs to `outputs/`):
```bash
python -m diagram2code path/to/image.png
```
Export a runnable bundle:
```bash
python -m diagram2code path/to/image.png --export out
```
Render the detected graph (top-down layout):
```bash
python -m diagram2code path/to/image.png --export out --render-graph --render-layout topdown
```
Render the graph as SVG:
```bash
python -m diagram2code path/to/image.png --export out --render-graph --render-format svg
```
Run without writing debug artifacts:
```bash
python -m diagram2code path/to/image.png --no-debug
```
### Diagram Examples
Simple linear flow
```
[ A ] → [ B ] → [ C ]
```
Branching flow
```
      → [ B ]
[ A ]
      → [ C ]
```

### OCR (Optional)
`diagram2code` can extract text labels using Tesseract OCR.

Requirements:
- System: `tesseract-ocr`
- Python: `pytesseract`

If OCR is unavailable, the pipeline still works and labels default to empty.

## Limitations
- The current image-to-code parser is optimized for simple rectangular flowchart steps
- Arrow detection is heuristic-based
- Complex curves, diagonals, or overlapping arrows may fail
- No text extraction from inside shapes
- Not intended for UML, BPMN, or hand-drawn diagrams

## Datasets, Predictors & Benchmarks

Beyond image-to-code conversion, `diagram2code` includes a **dataset-backed benchmarking system** for reproducible evaluation of graph predictors.

This functionality is optional and intended for:
- experimentation
- research
- comparative evaluation

### Core concepts
- **Datasets** define inputs, ground-truth graphs, and splits
- **Predictors** generate graph predictions from inputs
- **Benchmarks** compute aggregate metrics and export structured JSON results

### Included predictors
- `oracle` — upper bound using ground-truth graphs
- `heuristic` — deterministic non-ML baseline
- `naive` — weak baseline returning a single centered node and no edges
- `vision` — legacy image-based CV pipeline

### Benchmark metrics
Benchmarks report:
- node precision / recall / f1
- edge precision / recall / f1
- `direction_accuracy`
- `exact_match_rate`
- `node_count_error`
- `edge_count_error`

### Minimal local dataset example

Build a small deterministic synthetic dataset:

```powershell
python -m diagram2code dataset build synthflow --out outputs\synthflow --split test --num-samples 5 --seed 0
```
Run an oracle benchmark:
```powershell
python -m diagram2code benchmark --dataset outputs\synthflow --split test --predictor oracle --limit 3 --json outputs\oracle_result.json
```
Run a weak baseline benchmark:
```powershell
python -m diagram2code benchmark --dataset outputs\synthflow --split test --predictor naive --limit 3 --json outputs\naive_result.json
```
Inspect a benchmark result:
```powershell
python -m diagram2code benchmark info outputs\oracle_result.json
```

### Additional documentation

**Datasets**
- Overview: `docs/datasets/OVERVIEW.md`
- Fetching & cache layout: `docs/datasets/FETCHING.md`
- Adapter authoring: `docs/datasets/ADAPTER_GUIDE.md`
- FlowLearn reference: `docs/datasets/FLOWLEARN.md`
- Dataset contract: `docs/datasets/PHASE_3_CONTRACT.md`

**Predictors**
- Predictor interface: `docs/predictors/PREDICTOR_CONTRACT.md`
- Oracle predictor: `docs/predictors/ORACLE.md`
- Heuristic baseline: `docs/predictors/HEURISTIC.md`

**Benchmarks**
- Result schema: `docs/benchmarks/RESULT_SCHEMA.md`
- Reproducibility checklist: `docs/benchmarks/REPRODUCIBILITY.md`
- Leaderboard format: `docs/benchmarks/LEADERBOARD_FORMAT.md`

### Minimal evaluation example

```bash
diagram2code dataset list
diagram2code dataset fetch tiny_remote_v1 --yes
diagram2code benchmark --dataset tiny_remote_v1 --predictor oracle
```
### Inspect Benchmark Results

```bash
diagram2code benchmark info outputs/result.json
```
Prints a concise summary of metrics and provenance.
### Strict Manifest Enforcement

To require a dataset to have a valid `manifest.json`:

```bash
diagram2code benchmark \
  --dataset flowlearn \
  --predictor oracle \
  --fail-on-missing-manifest
```

## Demo

Convert a simple diagram image into runnable Python code:

```bash
diagram2code tests/fixtures/simple.png --out demo_outputs --extract-labels
```







