Metadata-Version: 2.0
Name: anago
Version: 0.0.1
Summary: Sequence labeling library using Keras.
Home-page: https://github.com/Hironsan/anago
Author: Hironsan
Author-email: hiroki.nakayama.py@gmail.com
License: MIT
Platform: UNKNOWN
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2.6
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Requires-Dist: Keras (>=2.0.5)
Requires-Dist: h5py (>=2.7.0)
Requires-Dist: numpy (>=1.13.0)
Requires-Dist: scikit-learn (>0.18.2)
Requires-Dist: tensorflow (>=1.2.0)


anaGo
=====

***anaGo*** is a state-of-the-art library for sequence labeling using
Keras.

anaGo can performs named-entity recognition (NER), part-of-speech
tagging (POS tagging), semantic role labeling (SRL) and so on.

Feature Support
---------------

anaGo provide following features: \* learning your own task without any
knowledge. \* defining your own model. \* downloading learned model for
many tasks. (e.g. NER, POS Tagging, etc...)

Install
-------

To install anaGo, simply run:

::

    $ pip install anago

or install from the repository:

::

    $ git clone https://github.com/Hironsan/anago.git
    $ cd anago
    $ pip install -r requirements.txt

Get Started
-----------

Import
~~~~~~

First, import the necessary modules:

.. code:: python

    import os
    import anago
    from anago.data.reader import load_data_and_labels, load_word_embeddings
    from anago.data.preprocess import prepare_preprocessor
    from anago.config import ModelConfig, TrainingConfig

They include loading modules, a preprocessor and configs.

And set parameters to use later:

.. code:: python

    DATA_ROOT = 'data/conll2003/en/ner'
    SAVE_ROOT = './models'  # trained model
    LOG_ROOT = './logs'     # checkpoint, tensorboard
    embedding_path = './data/glove.6B/glove.6B.100d.txt'
    model_config = ModelConfig()
    training_config = TrainingConfig()

Loading data
~~~~~~~~~~~~

After importing the modules, read data for training, validation and
test:

.. code:: python

    train_path = os.path.join(DATA_ROOT, 'train.txt')
    valid_path = os.path.join(DATA_ROOT, 'valid.txt')
    test_path = os.path.join(DATA_ROOT, 'test.txt')
    x_train, y_train = load_data_and_labels(train_path)
    x_valid, y_valid = load_data_and_labels(valid_path)
    x_test, y_test = load_data_and_labels(test_path)

After reading the data, prepare preprocessor and pre-trained word
embeddings:

.. code:: python

    p = prepare_preprocessor(x_train, y_train)
    embeddings = load_word_embeddings(p.vocab_word, embedding_path, model_config.word_embedding_size)
    model_config.vocab_size = len(p.vocab_word)
    model_config.char_vocab_size = len(p.vocab_char)

Now we are ready for training :)

Training a model
~~~~~~~~~~~~~~~~

Let's train a model. For training a model, we can use ***Trainer***.
Trainer manages everything about training. Prepare an instance of
Trainer class and give train data and valid data to train method:

::

    trainer = anago.Trainer(model_config, training_config, checkpoint_path=LOG_ROOT, save_path=SAVE_ROOT,
                            preprocessor=p, embeddings=embeddings)
    trainer.train(x_train, y_train, x_valid, y_valid)

If training is progressing normally, progress bar will be displayed as
follows:

.. code:: commandline

    ...
    Epoch 3/15
    702/703 [============================>.] - ETA: 0s - loss: 60.0129 - f1: 89.70
    703/703 [==============================] - 319s - loss: 59.9278   
    Epoch 4/15
    702/703 [============================>.] - ETA: 0s - loss: 59.9268 - f1: 90.03
    703/703 [==============================] - 324s - loss: 59.8417   
    Epoch 5/15
    702/703 [============================>.] - ETA: 0s - loss: 58.9831 - f1: 90.67
    703/703 [==============================] - 297s - loss: 58.8993   
    ...

Evaluation for a model
~~~~~~~~~~~~~~~~~~~~~~

To evaluate the trained model, we can use ***Evaluator***. Evaluator
performs evaluation. Prepare an instance of Evaluator class and give
test data to eval method:

::

    weights = os.path.join(SAVE_ROOT, 'model_weights.h5')

    evaluator = anago.Evaluator(model_config, weights, save_path=SAVE_ROOT, preprocessor=p)
    evaluator.eval(x_test, y_test)

After evaluation, F1 value is output:

.. code:: commandline

    - f1: 90.67

Tagging a sentence
~~~~~~~~~~~~~~~~~~

To tag any text, we can use ***Tagger***. Prepare an instance of Tagger
class and give text to tag method:

::

    weights = os.path.join(SAVE_ROOT, 'model_weights.h5')
    tagger = anago.Tagger(model_config, weights, save_path=SAVE_ROOT, preprocessor=p)

Let's try tagging a sentence, "President Obama is speaking at the White
House." We can do it as follows:

.. code:: python

    >>> sent = 'President Obama is speaking at the White House.'
    >>> print(tagger.tag(sent))
    [('President', 'O'), ('Obama', 'PERSON'), ('is', 'O'),
     ('speaking', 'O'), ('at', 'O'), ('the', 'O'),
     ('White', 'LOCATION'), ('House', 'LOCATION'), ('.', 'O')]
    >>> print(tagger.get_entities(sent))
    {'Person': ['Obama'], 'LOCATION': ['White House']}

Reference
---------

This library uses bidirectional LSTM + CRF model based on `Neural
Architectures for Named Entity
Recognition <https://arxiv.org/abs/1603.01360>`__ by Lample, Guillaume,
et al., NAACL 2016.


