Metadata-Version: 2.1
Name: bordr
Version: 0.1.4
Summary: A  fast and accurate POS and morphological tagging toolkit, lightly adapted to Tibetan language.
Home-page: https://github.com/Esukhia/RDRPOSTagger
Author: Dat Quoc Nguyen
Author-email: dqnguyen@unimelb.edu.au
License: GNU General Public License
Project-URL: Source, https://github.com/Esukhia/RDRPOSTagger
Project-URL: Tracker, https://github.com/Esukhia/RDRPOSTagger/issues
Keywords: part-of-speech-tagger java nlp pos-tagging pos-tagger python3
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU General Public License (GPL)
Classifier: Natural Language :: Tibetan
Requires-Python: >=3.6
Description-Content-Type: text/markdown

## bordr ##

A pip installable version of RDRPOSTagger with Tibetan-specific changes.

 - See the original [RDRPOSTagger](https://github.com/datquocnguyen/RDRPOSTagger) for documentation.
 - Check the [modifications](https://github.com/Esukhia/bordr/blob/master/CHANGELOG.md) implemented in this repo.
 - See [rdr-data](https://github.com/Esukhia/rdr-data) for RDR models for Tibetan.
 - See [usage.py](https://github.com/Esukhia/bordr/blob/master/usage.py) for the programmatic interface available in bordr

### Maintenance

Build the source dist:

```bash
rm -rf dist/
python3 setup.py clean sdist
```

and upload on twine (version >= `1.11.0`) with:

```bash
twine upload dist/*
```

### Latest change
The SDICT content passed to generate INIT file is changed.
The words in SDICT are given U(Unique tag from bilou tagging system) tag as those words are segmented as Unique token by botok.
With that changed SDICT content, we will get INIT file based on botok segmentation. Hence rules generated will be able to resolve botok segmentation ambiguity.

