Metadata-Version: 2.0
Name: bconv
Version: 0.2.1
Summary: Convert between BioNLP formats
Home-page: https://github.com/lfurrer/bconv
Author: Lenz Furrer
Author-email: lenz.furrer@gmail.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/x-rst
Requires-Dist: lxml
Requires-Dist: nltk


``bconv``\ : Python library for converting between BioNLP formats
===================================================================

``bconv`` offers format conversion and manipulation of documents with text and annotations.
It supports various popular formats used in natural-language processing for biomedical texts.

Supported formats
-----------------

The following formats are currently supported:

.. list-table::
   :header-rows: 1

   * - Name
     - I
     - O
     - T
     - A
     - Description
   * - ``bioc_xml``\ , ``bioc_json``
     - ✓
     - ✓
     - ✓
     - ✓
     - BioC
   * - ``bionlp``
     - 
     - ✓
     - 
     - ✓
     - BioNLP stand-off
   * - ``brat``
     - 
     - ✓
     - 
     - ✓
     - brat stand-off
   * - ``conll``
     - ✓
     - ✓
     - ✓
     - ✓
     - CoNLL
   * - ``europepmc``\ , ``europepmc.zip``
     - 
     - ✓
     - 
     - ✓
     - Europe-PMC JSON
   * - ``pubtator``\ , ``pubtator_fbk``
     - ✓
     - ✓
     - ✓
     - ✓
     - PubTator
   * - ``pubmed``\ , ``pxml``\ , ``pxml.gz``
     - ✓
     - 
     - ✓
     - 
     - PubMed abstracts
   * - ``pmc``\ , ``nxml``
     - ✓
     - 
     - ✓
     - 
     - PMC full-text
   * - ``pubanno_json``
     - 
     - ✓
     - ✓
     - ✓
     - PubAnnotation JSON
   * - ``tsv``\ , ``text_tsv``
     - 
     - ✓
     - ✓
     - ✓
     - tab-separated values
   * - ``txt``
     - ✓
     - ✓
     - ✓
     - 
     - plain text
   * - ``txt_json``
     - ✓
     - 
     - ✓
     - 
     - collection of plain-text documents


I: input format;
O: output format;
T: can represent text;
A: can represent annotations (entities).

Installation
------------

``bconv`` is hosted on `PyPI <https://pypi.org/project/bconv/>`_\ , so you can use ``pip`` to install it:

.. code-block:: sh

   $ pip install bconv

By default, ``pip`` attempts a system-level installation, which might require admin privileges.
Alternatively, use ``pip``\ 's ``--user`` flag for an installation owned by the current user.

Usage
-----

Load an annotated collection in BioC XML format:

.. code-block:: pycon

   >>> import bconv
   >>> coll = bconv.load('bioc_xml', 'path/to/example.xml')
   >>> coll
   <Collection with 37 subelements at 0x7f1966e4b3c8>

A Collection is a sequence of Document objects:

.. code-block:: pycon

   >>> coll[0]
   <Document with 12 subelements at 0x7f1966e2f6d8>

Documents contain Sections, which contain Sentences:

.. code-block:: pycon

   >>> sent = coll[0][3][5]
   >>> sent.text
   'A Live cell imaging reveals that expression of GFP‐KSHV‐TK, but not GFP induces contraction of HeLa cells.'

Find the first annotation for this sentence:

.. code-block:: pycon

   >>> e = next(sent.iter_entities())
   >>> e.start, e.end, e.text
   (571, 578, 'KSHV‐TK')
   >>> e.info
   {'type': 'gene/protein', 'ui': 'Uniprot:F5HB62'}

Write the whole collection to a new file in CoNLL format:

.. code-block:: pycon

   >>> with open('path/to/example.conll', 'w', encoding='utf8') as f:
   ...     bconv.dump('conll', coll, f, tagset='IOBES', include_offsets=True)


