Metadata-Version: 2.0
Name: TUPA
Version: 1.2.3
Summary: Transition-based UCCA Parser
Home-page: https://github.com/huji-nlp/tupa
Author: Daniel Hershcovich
Author-email: danielh@cs.huji.ac.il
License: UNKNOWN
Description-Content-Type: UNKNOWN
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3.6
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Dist: ucca (>=1.0.14)
Requires-Dist: numpy
Requires-Dist: spacy (>=1.9.0)
Requires-Dist: cython (>=0.27.3)
Requires-Dist: penman (>=0.6.2)
Requires-Dist: nltk (>=3.2.5)
Requires-Dist: parsimonious (>=0.8.0)
Requires-Dist: word2number (>=1.1)
Requires-Dist: pyspotlight (>=0.7.1)
Requires-Dist: requests (>=2.8.14)
Requires-Dist: logbook (>=1.1.0)
Requires-Dist: tqdm (>=4.19.4)
Requires-Dist: dynet (>=2.0.1)
Provides-Extra: server
Requires-Dist: Flask (>=0.12.2); extra == 'server'
Requires-Dist: Flask-Assets (>=0.12); extra == 'server'
Requires-Dist: Flask-Compress (>=1.4.0); extra == 'server'
Requires-Dist: Jinja2 (>=2.9.6); extra == 'server'
Requires-Dist: matplotlib (>=2.0.2); extra == 'server'
Requires-Dist: networkx (>=1.11); extra == 'server'
Requires-Dist: webassets (>=0.12.1); extra == 'server'

Transition-based UCCA Parser
============================

TUPA is a transition-based parser for `Universal Conceptual Cognitive
Annotation (UCCA) <http://github.com/huji-nlp/ucca>`__.

Requirements
~~~~~~~~~~~~

-  Python 3.x
-  All `dependencies for
   DyNet <http://dynet.readthedocs.io/en/latest/python.html>`__

Install
~~~~~~~

Create a Python virtual environment:

::

    virtualenv --python=/usr/bin/python3 venv
    . venv/bin/activate              # on bash
    source venv/bin/activate.csh     # on csh

Install the latest release:

::

    pip install tupa

Alternatively, install the latest code from GitHub (may be unstable):

::

    git clone https://github.com/danielhers/tupa
    cd tupa
    python setup.py install

Train the parser
~~~~~~~~~~~~~~~~

Having a directory with UCCA passage files (for example, `the Wiki
corpus <https://github.com/huji-nlp/ucca-corpus/tree/master/wiki/pickle>`__),
run:

::

    python -m tupa.parse -t <train_dir> -d <dev_dir> -c <model_type> -m <model_filename>

The possible model types are ``sparse``, ``mlp`` and ``bilstm``.

Parse a text file
~~~~~~~~~~~~~~~~~

Run the parser on a text file (here named ``example.txt``) using a
trained model:

::

    python -m tupa.parse example.txt -c <model_type> -m <model_filename>

An ``xml`` file will be created per passage (separate by blank lines in
the text file).

Pre-trained models
~~~~~~~~~~~~~~~~~~

To download and extract models pre-trained on the Wiki corpus, run:

::

    curl --remote-name-all http://www.cs.huji.ac.il/~danielh/ucca/{sparse,mlp,bilstm}-1.2.tar.gz
    tar xvzf sparse-1.2.tar.gz
    tar xvzf mlp-1.2.tar.gz
    tar xvzf bilstm-1.2.tar.gz

Run the parser using any of them:

::

    python -m tupa.parse example.txt -c sparse -m models/sparse
    python -m tupa.parse example.txt -c mlp -m models/mlp
    python -m tupa.parse example.txt -c bilstm -m models/bilstm

Other languages
~~~~~~~~~~~~~~~

To get a French/German model pre-trained on `the *20K Leagues*
corpus <https://github.com/huji-nlp/ucca-corpus/tree/master/vmlslm/fr>`__,
run:

::

    curl -O http://www.cs.huji.ac.il/~danielh/ucca/sparse-1.2-{fr,de}.tar.gz
    tar xvzf sparse-1.2-fr.tar.gz
    tar xvzf sparse-1.2-de.tar.gz

Run the parser on a French/German text file, using the French/German
spaCy models too:

::

    export SPACY_MODEL=fr_depvec_web_lg
    python -m tupa.parse exemple.txt -c sparse -m models/sparse-fr

    export SPACY_MODEL=de_core_news_md
    python -m tupa.parse beispiel.txt -c sparse -m models/sparse-de

Author
------

-  Daniel Hershcovich: danielh@cs.huji.ac.il

Citation
--------

If you make use of this software, please cite `the following
paper <http://www.cs.huji.ac.il/~danielh/acl2017.pdf>`__:

::

    @InProceedings{hershcovich2017a,
      author    = {Hershcovich, Daniel  and  Abend, Omri  and  Rappoport, Ari},
      title     = {A Transition-Based Directed Acyclic Graph Parser for UCCA},
      booktitle = {Proc. of ACL},
      year      = {2017},
      pages     = {1127--1138},
      url       = {http://aclweb.org/anthology/P17-1104}
    }

The version of the parser used in the paper is
`v1.0 <https://github.com/huji-nlp/tupa/releases/tag/v1.0>`__. To
reproduce the experiments from the paper, run in an empty directory
(with a new virtualenv):

::

    pip install "tupa>=1.0,<1.1"
    mkdir pickle models
    curl -L http://www.cs.huji.ac.il/~danielh/ucca/ucca_corpus_pickle.tgz | tar xz -C pickle
    curl --remote-name-all http://www.cs.huji.ac.il/~danielh/ucca/{sparse,mlp,bilstm}.tgz
    tar xvzf sparse.tgz
    tar xvzf mlp.tgz
    tar xvzf bilstm.tgz
    python -m spacy download en
    python -m scripts.split_corpus pickle -t 4282 -d 454 -l
    python -m tupa.parse -c sparse -m models/ucca-sparse -Web pickle/test
    python -m tupa.parse -c mlp -m models/ucca-mlp -Web pickle/test
    python -m tupa.parse -c bilstm -m models/ucca-bilstm -Web pickle/test

License
-------

This package is licensed under the GPLv3 or later license (see
```LICENSE.txt`` <LICENSE.txt>`__).

|Build Status (Travis CI)| |Build Status (AppVeyor)| |PyPI version|

.. |Build Status (Travis CI)| image:: https://travis-ci.org/danielhers/tupa.svg?branch=master
   :target: https://travis-ci.org/danielhers/tupa
.. |Build Status (AppVeyor)| image:: https://ci.appveyor.com/api/projects/status/github/danielhers/tupa?svg=true
   :target: https://ci.appveyor.com/project/danielh/tupa
.. |PyPI version| image:: https://badge.fury.io/py/TUPA.svg
   :target: https://badge.fury.io/py/TUPA


