Metadata-Version: 2.1
Name: bratutils
Version: 0.2.2
Summary: brat utilities
Home-page: https://github.com/savkov/transcriptor
Author: Sasho Savkov
Author-email: me@sasho.io
License: MIT
Description: bratutils
        =========
        [![CircleCI](https://circleci.com/gh/savkov/bratutils.svg?style=svg&circle-token=9a7bdcb066c87c45017fe2214c71f2e2f9672c94)](https://circleci.com/gh/savkov/bratutils)
        [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
        
        A collection of utilities for manipulating data and calculating inter-annotator 
        agreement in brat annotation files.
        
        ### Installation
        
        Install as a normal package from the source directory.
        
        ```bash
        $ pip install bratutils
        ```
        
        
        ### Agreement Definition
        
        Agreement in multi-token annotations is commonly evaluated using [f-score][fsc].
        due to various problems with computing the traditional [Krippendorf's alpha][al] 
        and [Cohen's kappa][ka]. [Hripcsak][hripcsak] prove the validity of the metric 
        for very large populations, i.e. for unrestricted text annotations.
        
        This library roughly follows the definitions of precision and recall calculation
        from the [MUC-7 test scoring][muc]. The basic definitions along with some 
        additional restrictions are laid out below:
        
        * `CORRECT` - when annotation tags and indices match completely
        * `INCORRECT` - when annotation tags do not match, but the indices coincide
        * `PARTIAL` - when the annotation tags are the same but one of the annotations
        has the same end index and a different start index
        * `MISSING` - annotations exising only in the gold standard annotation set
        * `SPURIOUS` - annotations existing only in the candidate annotation set
        
        _Note_: the gold standard is considered the collections/document from which the 
         comparison is invoked, while the supplied parallel annotation is considered 
         the candidate set.
         
        _*Disclaimer:*_ the current definition of the `PARTIAL` category accomodates 
        working with syntactic chunks. A different arrangement (e.g. pick largest 
        contained tag as partial match instead of rightmost) might be more suitable for 
        other tasks, for example some types of semantic annotation.
        
        
        ### Examples
        
        Simple example:
        
        ```python
        from bratutils import agreement as a
        
        doc = a.Document('res/samples/A/data-sample-1.ann')
        doc2 = a.Document('res/samples/B/data-sample-1.ann')
        
        doc.make_gold()
        statistics = doc2.compare_to_gold(doc)
        
        print(statistics)
        ```
        
        Output:
        
        ```shell
        -------------------MUC-Table--------------------
        ------------------------------------------------
        pos:135
        act:134
        cor:115
        par:5
        inc:4
        mis:11
        spu:10
        ------------------------------------------------
        pre:0.858208955224
        rec:0.851851851852
        fsc:0.855018587361
        ------------------------------------------------
        und:0.0814814814815
        ovg:0.0746268656716
        sub:0.0725806451613
        ------------------------------------------------
        bor:119
        ibo:15
        ------------------------------------------------
        ------------------------------------------------
        ```
        
        
        [fsc]: <https://en.wikipedia.org/wiki/F1_score>
        [al]: <https://en.wikipedia.org/wiki/Krippendorff%27s_alpha>
        [ka]: <https://en.wikipedia.org/wiki/Cohen%27s_kappa>
        [hripcsak]: <http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1090460/>
        [muc]: <https://aclweb.org/anthology/M/M98/M98-1024.pdf>
        
Platform: UNKNOWN
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: Unix
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.2
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Topic :: Scientific/Engineering
Description-Content-Type: text/markdown
