Metadata-Version: 2.1
Name: camelot-py
Version: 0.2.2
Summary: PDF Table Extraction for Humans.
Home-page: http://camelot-py.readthedocs.io/
Author: Vinayak Mehta
Author-email: vmehta94@gmail.com
License: MIT License
Platform: UNKNOWN
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Description-Content-Type: text/markdown
Provides-Extra: dev
Provides-Extra: all
Requires-Dist: click (==6.7)
Requires-Dist: matplotlib (==2.2.3)
Requires-Dist: numpy (==1.15.2)
Requires-Dist: openpyxl (==2.5.8)
Requires-Dist: pandas (==0.23.4)
Requires-Dist: pdfminer.six (==20170720)
Requires-Dist: PyPDF2 (==1.26.0)
Provides-Extra: all
Requires-Dist: opencv-python (==3.4.2.17); extra == 'all'
Provides-Extra: dev
Requires-Dist: codecov (==2.0.15); extra == 'dev'
Requires-Dist: pytest (==3.8.0); extra == 'dev'
Requires-Dist: pytest-cov (==2.6.0); extra == 'dev'
Requires-Dist: pytest-runner (==4.2); extra == 'dev'
Requires-Dist: Sphinx (==1.7.9); extra == 'dev'
Requires-Dist: opencv-python (==3.4.2.17); extra == 'dev'

<p align="center">
   <img src="https://raw.githubusercontent.com/socialcopsdev/camelot/master/docs/_static/camelot.png" width="200">
</p>

# Camelot: PDF Table Extraction for Humans

[![Build Status](https://travis-ci.org/socialcopsdev/camelot.svg?branch=master)](https://travis-ci.org/socialcopsdev/camelot) [![codecov.io](https://codecov.io/github/socialcopsdev/camelot/badge.svg?branch=master&service=github)](https://codecov.io/github/socialcopsdev/camelot?branch=master)
 [![image](https://img.shields.io/pypi/v/camelot-py.svg)](https://pypi.org/project/camelot-py/) [![image](https://img.shields.io/pypi/l/camelot-py.svg)](https://pypi.org/project/camelot-py/) [![image](https://img.shields.io/pypi/pyversions/camelot-py.svg)](https://pypi.org/project/camelot-py/)

**Camelot** is a Python library that makes it easy for *anyone* to extract tables from PDF files!

---

**Here's how you can extract tables from PDF files.** Check out the PDF used in this example [here](https://github.com/socialcopsdev/camelot/blob/master/docs/_static/pdf/foo.pdf).

<pre>
>>> import camelot
>>> tables = camelot.read_pdf('foo.pdf')
>>> tables
&lt;TableList n=1&gt;
>>> tables.export('foo.csv', f='csv', compress=True) # json, excel, html
>>> tables[0]
&lt;Table shape=(7, 7)&gt;
>>> tables[0].parsing_report
{
    'accuracy': 99.02,
    'whitespace': 12.24,
    'order': 1,
    'page': 1
}
>>> tables[0].to_csv('foo.csv') # to_json, to_excel, to_html
>>> tables[0].df # get a pandas DataFrame!
</pre>

| Cycle Name | KI (1/km) | Distance (mi) | Percent Fuel Savings |                 |                 |                |
|------------|-----------|---------------|----------------------|-----------------|-----------------|----------------|
|            |           |               | Improved Speed       | Decreased Accel | Eliminate Stops | Decreased Idle |
| 2012_2     | 3.30      | 1.3           | 5.9%                 | 9.5%            | 29.2%           | 17.4%          |
| 2145_1     | 0.68      | 11.2          | 2.4%                 | 0.1%            | 9.5%            | 2.7%           |
| 4234_1     | 0.59      | 58.7          | 8.5%                 | 1.3%            | 8.5%            | 3.3%           |
| 2032_2     | 0.17      | 57.8          | 21.7%                | 0.3%            | 2.7%            | 1.2%           |
| 4171_1     | 0.07      | 173.9         | 58.1%                | 1.6%            | 2.1%            | 0.5%           |

There's a [command-line interface](https://camelot-py.readthedocs.io/en/latest/user/cli.html) too!

**Note:** Camelot only works with text-based PDFs and not scanned documents. (As Tabula [explains](https://github.com/tabulapdf/tabula#why-tabula), "If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based".)

## Why Camelot?

- **You are in control.**: Unlike other libraries and tools which either give a nice output or fail miserably (with no in-between), Camelot gives you the power to tweak table extraction. (This is important since everything in the real world, including PDF table extraction, is fuzzy.)
- *Bad* tables can be discarded based on **metrics** like accuracy and whitespace, without ever having to manually look at each table.
- Each table is a **pandas DataFrame**, which seamlessly integrates into [ETL and data analysis workflows](https://gist.github.com/vinayak-mehta/e5949f7c2410a0e12f25d3682dc9e873).
- **Export** to multiple formats, including JSON, Excel and HTML.

See [comparison with other PDF table extraction libraries and tools](https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools).

## Installation

After [installing the dependencies](https://camelot-py.readthedocs.io/en/latest/user/install.html) ([tk](https://packages.ubuntu.com/trusty/python-tk) and [ghostscript](https://www.ghostscript.com/)), you can simply use pip to install Camelot:

<pre>
$ pip install camelot-py[all]
</pre>

### Alternatively

After [installing the dependencies](https://camelot-py.readthedocs.io/en/latest/user/install.html), clone the repo using:

<pre>
$ git clone https://www.github.com/socialcopsdev/camelot
</pre>

and install Camelot using pip:

<pre>
$ cd camelot
$ pip install ".[all]"
</pre>

**Note:** Use a [virtualenv](https://virtualenv.pypa.io/en/stable/) if you don't want to affect your global Python installation.

## Documentation

Great documentation is available at [http://camelot-py.readthedocs.io/](http://camelot-py.readthedocs.io/).

## Development

The [Contributor's Guide](https://camelot-py.readthedocs.io/en/latest/dev/contributing.html) has detailed information about contributing code, documentation, tests and more. We've included some basic information in this README.

### Source code

You can check the latest sources with:

<pre>
$ git clone https://www.github.com/socialcopsdev/camelot
</pre>

### Setting up a development environment

You can install the development dependencies easily, using pip:

<pre>
$ pip install camelot-py[dev]
</pre>

### Testing

After installation, you can run tests using:

<pre>
$ python setup.py test
</pre>

## Versioning

Camelot uses [Semantic Versioning](https://semver.org/). For the available versions, see the tags on this repository. For the changelog, you can check out [HISTORY.md](https://github.com/socialcopsdev/camelot/blob/master/HISTORY.md).

## License

This project is licensed under the MIT License, see the [LICENSE](https://github.com/socialcopsdev/camelot/blob/master/LICENSE) file for details.

