Metadata-Version: 2.1
Name: bnlm
Version: 1.0.0
Summary: Bengali Language Model
Home-page: https://github.com/sagorbrur/bnlm
Author: Sagor Sarker
Author-email: sagorhem3532@gmail.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX :: Linux
Description-Content-Type: text/markdown
Requires-Dist: aiohttp (>=3.5.4)
Requires-Dist: async-timeout (>=3.0.1)
Requires-Dist: bottleneck
Requires-Dist: fastprogress (>=0.1.19)
Requires-Dist: matplotlib
Requires-Dist: numexpr
Requires-Dist: numpy (>=1.15)
Requires-Dist: nvidia-ml-py3
Requires-Dist: packaging
Requires-Dist: pandas
Requires-Dist: pyyaml
Requires-Dist: requests
Requires-Dist: scipy
Requires-Dist: spacy (>=2.0.18)
Requires-Dist: typing
Requires-Dist: fastai (==1.0.57)
Requires-Dist: sentencepiece
Requires-Dist: pynvx (>=1.0.0) ; platform_system == "Darwin"
Requires-Dist: dataclasses ; python_version < "3.7"

# Bengal Language Model
Bengali language model is build with fastai's [ULMFit]() and ready for `prediction` and `classfication` task.


NB: 
* This tool mostly followed [inltk](https://github.com/goru001/inltk)
* We separated `Bengali` part with better evaluation results

# Installation

`pip install bnlm`


# Evaluation Result

## Language Model
* Accuracy 48.26% on validation dataset
* Perplexity: ~22.79

## Training
To train with your own corpus follow [this](https://github.com/sagorbrur/Bengali-Language-Model) repository

# Features and API

## Download pretrained Model
To start, first download pretrained Language Model and Sentencepiece model

```py
from bnlm.bnlm import download_models

download_models()

```
## Predict N Words
```py
from bnlm.bnlm import BengaliTokenizer
from bnlm.bnlm import predict_n_words
model_path = 'model'
input_sen = "আমি বাজারে"
output = predict_n_words(input_sen, 3, model_path)
print("Word Prediction: ", output)

```

## Get Sentence Encoding
```py
from bnlm.bnlm import BengaliTokenizer
from bnlm.bnlm import get_sentence_encoding
model_path = 'model'
sp_model = "model/bn_spm.model"
input_sentence = "আমি ভাত খাই।"
encoding = get_sentence_encoding(input_sentence, model_path, sp_model)
print("sentence encoding is: ", encoding)

```

## Get Embedding Vectors
```py
from bnlm.bnlm import BengaliTokenizer
from bnlm.bnlm import get_embedding_vectors
model_path = 'model'
sp_model = "model/bn_spm.model"
input_sentence = "আমি ভাত খাই।"
embed = get_embedding_vectors(input_sentence, model_path, sp_model)
print("sentence embedding is : ", embed)


```


## Sentence Similarity
```py
from bnlm.bnlm import BengaliTokenizer
from bnlm.bnlm import get_sentence_encoding
from bnlm.bnlm import get_similar_sentences
model_path = 'model'
sp_model = "model/bn_spm.model"
sentence_1 = "আমি ভাত খাই।"
sentence_2 = "আমি ভাত খাই।"
sim = get_sentence_similarity(sentence_1, sentence_2, model_path, sp_model)
print("similarity is: ", sim)

```

## Get Simillar Sentences
```py
from bnlm.bnlm import BengaliTokenizer
from bnlm.bnlm import get_embedding_vectors
from bnlm.bnlm import get_similar_sentences

model_path = 'model'
sp_model = "model/bn_spm.model"

input_sentence = "আমি ভাত খাই।"
sen_pred = get_similar_sentences(input_sentence, 3, model_path, sp_model)
print(sen_pred)


```


## Classification
```upcomming```




