Metadata-Version: 2.1
Name: bgnlp
Version: 0.0.9
Summary: Package for Bulgarian Natural Language Processing (NLP)
Home-page: UNKNOWN
Author: Adam Fauzi
Author-email: adamfzh98@gmail.com
License: UNKNOWN
Keywords: pytorch,nlp,bulgaria,machine learning,deep learning,AI
Platform: UNKNOWN
Classifier: Development Status :: 1 - Planning
Classifier: Intended Audience :: Developers
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Operating System :: Unix
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Microsoft :: Windows
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch (==1.11.0)
Requires-Dist: numpy (==1.22.4)
Requires-Dist: pandas (==1.4.3)
Requires-Dist: torchmetrics (==0.11.0)
Requires-Dist: torchtext (==0.12.0)
Requires-Dist: gdown (==4.6.0)
Requires-Dist: transformers (==4.26.0)


# `bgnlp`: Model-first approach to Bulgarian NLP



```sh

pip install bgnlp

```



## Package functionalities



### Part-of-speech tagging



```python

from bgnlp import PosTagger, PosTaggerConfig



config = PosTaggerConfig()

pos = PosTagger(config=config)

print(pos("Това е библиотека за обработка на естествен език."))

```



```json

[{

    "word": "Това",

    "tag": "PDOsn",

    "bg_desc": "местоимение",

    "en_desc": "pronoun"

}, {

    "word": "е",

    "tag": "VLINr3s",

    "bg_desc": "глагол",

    "en_desc": "verb"

}, {

    "word": "библиотека",

    "tag": "NCFsof",

    "bg_desc": "съществително име",

    "en_desc": "noun"

}, {

    "word": "за",

    "tag": "R",

    "bg_desc": "предлог",

    "en_desc": "preposition"

}, {

    "word": "обработка",

    "tag": "NCFsof",

    "bg_desc": "съществително име",

    "en_desc": "noun"

}, {

    "word": "на",

    "tag": "R",

    "bg_desc": "предлог",

    "en_desc": "preposition"

}, {

    "word": "естествен",

    "tag": "Asmo",

    "bg_desc": "прилагателно име",

    "en_desc": "adjective"

}, {

    "word": "език",

    "tag": "NCMsom",

    "bg_desc": "съществително име",

    "en_desc": "noun"

}, {

    "word": ".",

    "tag": "U",

    "bg_desc": "препинателен знак",

    "en_desc": "punctuation"

}]

```



### Lemmatization



```python

from bgnlp import LemmaTaggerConfig, LemmaTagger



lemma = LemmaTagger(config=LemmaTaggerConfig())

text = "Добре дошли!"

print(lemma(text))

```



```bash

[{'word': 'Добре', 'lemma': 'Добре'}, {'word': 'дошли', 'lemma': 'дойда'}, {'word': '!', 'lemma': '!'}]

```



