Metadata-Version: 2.0
Name: PyStanfordDependencies
Version: 0.3.0
Summary: Python interface for converting Penn Treebank trees to Stanford Dependencies and Universal Dependencies
Home-page: http://github.com/dmcc/PyStanfordDependencies
Author: David McClosky
Author-email: notsoweird+pystanforddependencies@gmail.com
License: Apache 2.0
Keywords: parsing,syntax,linguistics,language,artificial intelligence,machine learning,NLP,natural language,natural language processing,Stanford CoreNLP,Stanford Dependencies,Universal Dependencies
Platform: POSIX
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Natural Language :: English
Classifier: Operating System :: POSIX
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Provides-Extra: JPype
Requires-Dist: JPype1; extra == 'JPype'
Provides-Extra: visualization
Requires-Dist: asciitree; extra == 'visualization'
Requires-Dist: graphviz; extra == 'visualization'

PyStanfordDependencies
======================

.. image:: https://travis-ci.org/dmcc/PyStanfordDependencies.svg?branch=master
    :target: https://travis-ci.org/dmcc/PyStanfordDependencies

.. image:: https://badge.fury.io/py/PyStanfordDependencies.png
   :target: https://badge.fury.io/py/PyStanfordDependencies

.. image:: https://coveralls.io/repos/dmcc/PyStanfordDependencies/badge.png?branch=master
   :target: https://coveralls.io/r/dmcc/PyStanfordDependencies?branch=master

Python interface for converting `Penn Treebank
<http://www.cis.upenn.edu/~treebank/>`_ trees to `Stanford Dependencies
<http://nlp.stanford.edu/software/stanford-dependencies.shtml>`_ and
`Universal Dependencies <http://universaldependencies.github.io/docs/>`_.

Example usage
-------------
Start by getting a ``StanfordDependencies`` instance with
``StanfordDependencies.get_instance()``::

    >>> import StanfordDependencies
    >>> sd = StanfordDependencies.get_instance(backend='subprocess')

``get_instance()`` takes several options. ``backend`` can currently
be ``subprocess`` or ``jpype`` (see below). If you have an existing
`Stanford CoreNLP <http://nlp.stanford.edu/software/corenlp.shtml>`_ or
`Stanford Parser <http://nlp.stanford.edu/software/lex-parser.shtml>`_
jar file, use the ``jar_filename`` parameter to point to the full path of
the jar file. Otherwise, PyStanfordDependencies will download a jar file
for you and store it in locally (``~/.local/share/pystanforddeps``). You
can request a specific version with the ``version`` flag, e.g.,
``version='3.4.1'``. To convert trees, use the ``convert_trees()`` or
``convert_tree()`` method (note that by default, ``convert_trees()`` can
be considerably faster if you're doing batch conversion). These return
a sentence (list of ``Token`` objects) or a list of sentences (list of
list of ``Token`` objects) respectively::

    >>> sent = sd.convert_tree('(S1 (NP (DT some) (JJ blue) (NN moose)))')
    >>> for token in sent:
    ...     print token
    ...
    Token(index=1, form='some', cpos='DT', pos='DT', head=3, deprel='det')
    Token(index=2, form='blue', cpos='JJ', pos='JJ', head=3, deprel='amod')
    Token(index=3, form='moose', cpos='NN', pos='NN', head=0, deprel='root')

This tells you that ``moose`` is the head of the sentence and is
modified by ``some`` (with a ``det`` = determiner relation) and ``blue``
(with an ``amod`` = adjective modifier relation). Fields on ``Token``
objects are readable as attributes. See docs for additional options in
``convert_tree()`` and ``convert_trees()``.

Visualization
-------------

If you have the `asciitree <https://pypi.python.org/pypi/asciitree>`_
package, you can use a prettier ASCII formatter::

    >>> print sent.as_asciitree()
     moose [root]
      +-- some [det]
      +-- blue [amod]

If you have Python 2.7 or later, you can use `Graphviz
<http://graphviz.org/>`_ to render your graphs. You'll need the `Python
graphviz <https://pypi.python.org/pypi/graphviz>`_ package to call
``as_dotgraph()``::

    >>> dotgraph = sent.as_dotgraph()
    >>> print dotgraph
    digraph {
            0 [label=root]
            1 [label=some]
                    3 -> 1 [label=det]
            2 [label=blue]
                    3 -> 2 [label=amod]
            3 [label=moose]
                    0 -> 3 [label=root]
    }
    >>> dotgraph.render('moose') # renders a PDF by default
    'moose.pdf'
    >>> dotgraph.format = 'svg'
    >>> dotgraph.render('moose')
    'moose.svg'

The Python `xdot <https://pypi.python.org/pypi/xdot>`_
package provides an interactive visualization::

    >>> import xdot
    >>> window = xdot.DotWindow()
    >>> window.set_dotcode(dotgraph.source)

Both ``as_asciitree()`` and ``as_dotgraph()`` allow customization.
See the docs for additional options.

Backends
--------
Currently PyStanfordDependencies includes two backends:

- ``subprocess`` (works anywhere with a ``java`` binary, slow so
  batched conversions with ``convert_trees()`` are recommended)
- ``jpype`` (requires `jpype1 <https://pypi.python.org/pypi/JPype1>`_,
  faster than Subprocess, includes access to the Stanford CoreNLP
  lemmatizer)

By default, PyStanfordDependencies will attempt to use the ``jpype``
backend. If ``jpype`` isn't available or crashes on startup,
PyStanfordDependencies will fallback to ``subprocess`` with a warning.

Universal Dependencies status
-----------------------------
PyStanfordDependencies mostly supports `Universal Dependencies
<http://universaldependencies.github.io/docs/>`_ (see `issue #10
<https://github.com/dmcc/PyStanfordDependencies/issues/10>`_ for the
most up to date status). PyStanfordDependencies output matches Universal
Dependencies in terms of structure and dependency labels, but Universal
POS tags and features are missing. Currently, PyStanfordDependencies
will output Universal Dependencies by default (unless you ask for an
older version of Stanford CoreNLP).

More information
----------------
Licensed under `Apache 2.0 <http://www.apache.org/licenses/LICENSE-2.0>`_.

Written by David McClosky (`homepage
<http://nlp.stanford.edu/~mcclosky/>`_, `code <http://github.com/dmcc>`_)

Bug reports and feature requests: `GitHub issue tracker
<http://github.com/dmcc/PyStanfordDependencies/issues>`_

Release summaries
-----------------
- 0.3.0 (2015.10.09): Support copy nodes, more input checking/debugging help
- 0.2.0 (2015.08.02): Universal Dependencies support (mostly),
  Python 3 support (fully), minor API updates
- 0.1.7 (2015.06.13): Bugfixes for JPype, handle version mismatches in IBM Java
- 0.1.6 (2015.02.12): Support for graphviz formatting, CoreNLP 3.5.1,
  better Windows portability
- 0.1.5 (2015.01.10): Support for ASCII tree formatting
- 0.1.4 (2015.01.07): Fix CCprocessed support
- 0.1.3 (2015.01.03): Bugfixes, coveralls integration, refactoring
- 0.1.2 (2015.01.02): Better CoNLL structures, test suite and Travis CI
  support, bugfixes
- 0.1.1 (2014.12.15): More docs, fewer bugs
- 0.1 (2014.12.14): Initial version


