Metadata-Version: 2.1
Name: apertium2ud
Version: 0.0.7
Summary: Converting universal tags to Apertium tags.
Home-page: https://github.com/alexeyev/apertium2ud
Author: Anton Alekseev
Author-email: anton.m.alexeye@gmail.com
Keywords: natural language processing,apertium,morphology
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Topic :: Text Processing
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE

# apertium2ud

Obtaining the mapping between the two tagsets based 
on the [information from Apertium Wiki](https://wiki.apertium.org/w/index.php?title=List_of_symbols).

Loosely based on [this code](https://github.com/mr-martian/apertium-recursive-learning/blob/master/tags.py), 
hence the GPLv3 license.

To install, run

```bash
python -m pip install apertium2ud
```
The latest uploaded version is 0.0.4.

NB! The latest version from PyPI (yes, you can install the tool via pip) is equipped with the [apertium-kir](https://github.com/apertium/apertium-kir/blob/main/apertium-kir.kir.udx) `.udx` file rules.

To build the machine-readable mapping, run

```bash
python apertium_wiki_parser.py
```
## Apertium to Universal tags

```
>>> from apertium2ud.convert import a2ud
>>> tags = ["n", "pl", "acc"]
>>> a2ud(tags)
(['NOUN'], ['Number=Plur', 'Case=Acc'])
>>> tags_sophisticated = ["v", "tv", "ger", "nom", "cop", "aor", "p3", "pl"]
>>> a2ud(tags_sophisticated)
(['VERB', 'AUX'], ['Subcat=Tran', 'VerbForm=Vnoun', 'Case=Nom', 'Tense=Past', 'Person=3', 'Number=Plur'])
```

## Universal tags to Apertium

So far the conversion is far from perfect
```
Кыз NOUN {'Number[psor]=Sing', 'Number=Sing', 'Case=Nom', 'Person[psor]=3', 'Person=3'} ->
<px3sg><n><subj?nom?><sg><p3><px3sp> 

досуна NOUN {'Number[psor]=Sing', 'Number=Sing', 'Person[psor]=3', 'Case=Dat', 'Person=3'} ->
<px3sg><n><sg><dat><p3><px3sp> 

кат NOUN {'Case=Nom', 'Person=3', 'Number=Sing'} ->
<n><subj?nom?><sg><p3> 

жазган VERB {'Aspect=Perf', 'Polarity=Pos', 'Number=Sing', 'Tense=Past', 'Person=3', 'Evident=Fh'} ->
<past3p><vblex?v?vbmod?><sg><aff><aor?past?pret?><perf><p3> 

. PUNCT set() ->
<sent?apos?percent?clb?punct?> 
```

## TODO

* Should sections `chunks` and [XML tags](https://wiki.apertium.org/w/index.php?title=List_of_symbols#XML_tags) be added? [No](https://github.com/apertium/apertium/issues/185).
* Tests: Apertium -> UD -> Apertium, UD -> Apertium -> UD (sometimes losses are inevitable)
* Add the possibility to add the rules based on a `.udx` file, which usually describes custom tags

## How to cite

Greatly appreciated, if you use this work.

```
@misc{apertium2ud2023alekseev,
  title     = {{alexeyev/apertium2ud: mapping tagsets}},
  year      = {2023},
  url       = {https://github.com/alexeyev/apertium2ud}
}
```
