Metadata-Version: 2.1
Name: browse-ocrd
Version: 0.4.2
Summary: An extensible viewer for OCR-D workspaces
Home-page: https://github.com/hnesk/browse-ocrd
Author: Johannes Künsebeck
Author-email: kuensebeck@googlemail.com
License: MIT License
Keywords: OCR,OCR-D,mets,PAGE Xml
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: X11 Applications :: GTK
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: End Users/Desktop
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Image Recognition
Description-Content-Type: text/markdown
Requires-Dist: ocrd
Requires-Dist: Pillow
Requires-Dist: numpy (<1.19.0,>=1.17.0)
Requires-Dist: opencv-python-headless
Requires-Dist: PyGObject
Requires-Dist: python-magic
Requires-Dist: wheel
Requires-Dist: setuptools
Requires-Dist: lxml

# OCR-D Browser

An extensible viewer for [OCR-D](https://ocr-d.de/) [mets.xml](https://ocr-d.de/en/spec/mets) files

## Screenshot

![OCRD Browser with two image and one xml view](docs/screenshot.png)

## Installation on Ubuntu 18.04

```
sudo make deps-ubuntu
pip install browse-ocrd
```


## Usage
```
browse-ocrd ./path/to/mets.xml # or open interactively
```

## Features

- Browse fileGrps and pages, arranging views next to each other for comparison
- Show original or derived images (`AlternativeImage` on any level of the structural hierarchy)
- Show multiple images at once for different pages (horizontally) or different segments (vertically), zooming freely
- Show raw [PAGE-XML](https://ocr-d.de/en/spec/page) with syntax highlighting, open with [PageViewer](https://github.com/PRImA-Research-Lab/prima-page-viewer)
- Show concatenated [PAGE-XML](https://ocr-d.de/en/spec/page) text annotation
- Show rendered HTML comparison from [dinglehopper](https://github.com/qurator-spk/dinglehopper) evaluations

## Configuration

### Configuration file locations

At startup the following directories a searched for a config file named `ocrd-browser.conf` 

```python
# directories and their default values under Ubuntu 20.04
GLib.get_system_config_dirs()  # '/etc/xdg/xdg-ubuntu/ocrd-browser.conf', '/etc/xdg/ocrd-browser.conf'
GLib.get_user_config_dir()     # '/home/jk/.config/ocrd-browser.conf'  
os.getcwd()                    # './ocrd-browser.conf'
```

### Configuration file syntax

The `ocrd-browser.conf` file is an ini-file with the following keys:
```ini
[FileGroups]
# Preferred fileGrp names for thumbnail display in the Page Browser 
# Comma seperated list of regular expressions
preferredImages = OCR-D-IMG, OCR-D-IMG.*, ORIGINAL

# Each Tool has a section header [Tool XYZ]
# At the moment the only defined tool is "PageViewer"  
[Tool PageViewer]
# (ba)sh commandline to execute with placeholders  
commandline = /usr/bin/java -jar /home/jk/bin/JPageViewer/JPageViewer.jar --resolve-dir {workspace.directory} {file.path.absolute}
```

The `commandline` string will be used as a python format string with the keyword arguments:

* `workspace` : The current `ocrd.Workspace`, all properties get shell escaped (by `shlex.quote`) automatically.
* `file` : The current `ocrd_models.OcrdFile`, all properties get shell escaped (by `shlex.quote`) automatically, also there is an additional property `path` with the properties `absolute` and `relative`, so `{file.path.absolute}` will be replaced by the shell quoted absolute path of the file. 

> Note: You can get PRImA's PageViewer at [Github](https://github.com/PRImA-Research-Lab/prima-page-viewer/releases).


