Metadata-Version: 2.0
Name: Wextracto
Version: 0.9.2
Summary: Web Data Extraction Library Written in Python
Home-page: https://github.com/eBay/wextracto
Author: Giles Brown
Author-email: gsbrown@ebay.com
License: BSD
Download-URL: https://github.com/eBay/wextracto/tarball/0.9.2
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: License :: OSI Approved :: BSD License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Requires-Dist: six
Requires-Dist: requests
Requires-Dist: lxml (>=3)
Requires-Dist: cssselect
Requires-Dist: publicsuffix (>=1.1)

Wextracto: Web Data Extraction
==============================

.. image:: https://travis-ci.org/gilessbrown/wextracto.svg
    :target: http://travis-ci.org/gilessbrown/wextracto
    :alt: Build Status

Wextracto is a toolkit for command-line web data extraction.


Installation
~~~~~~~~~~~~

.. code-block:: bash

    $ pip install wextracto


Kicking the Tyres
~~~~~~~~~~~~~~~~~

.. code-block:: shell

    $ echo -e "[wex]\nsitemaps=wex.sitemaps:urls_from_sitemaps" > entry_points.txt
    $ wex "http://www.ebay.com/robots.txt"


Documentation
~~~~~~~~~~~~~

The documentation can be found here:

    http://wextracto.readthedocs.org/en/latest/index.html


.. :changelog:

Release History
---------------

0.9.2 (2016-05-13)
++++++++++++++++++

  * Fix bug in handling HTML comments when fixing numeric character references


0.9.1 (2016-04-26)
++++++++++++++++++

  * Fix bug when using nested Cache objects


0.9.0 (2016-04-16)
++++++++++++++++++

  * Add support for reading WARC response format


0.8.8 (2016-04-11)
++++++++++++++++++

  * Fix bug in handling of invalid numeric character references


0.8.5 (2015-12-07)
++++++++++++++++++

  * Allow utf-8 in HTTP headers (only applies to PY2)


0.8.3 (2015-09-23)
++++++++++++++++++

  * Fix bug in HTTP decode caused by magic bytes handling.


0.8.2 (2015-09-21)
++++++++++++++++++

  * Add magic_bytes to Response for more reliable wex.http:decode behaviour.


0.7.9 (2015-08-18)
++++++++++++++++++

  * Re-worked encoding for HTML to pre-parse


0.7 (2015-06-04)
++++++++++++++++++

  * Better proxy support

0.4 (2015-02-12)
++++++++++++++++++

  * Now we flatten labels and values.
  * href and src become href_url and src_url.

0.3 (2014-12-29)
++++++++++++++++++

* Some API changes + switch to "tab-separated JSON".

0.2.2 (2014-10-24)
++++++++++++++++++

* Uploaded sdist to PyPI for "pip install wextracto" simplicity.

0.1 (2014-10-16)
++++++++++++++++++

* Initial release as open source


