Metadata-Version: 2.4
Name: brnltk
Version: 3.2.2
Summary: A Part-of-Speech Tagger and Dialect Processing Toolkit for Bengali.
Home-page: https://github.com/ShakirHaque/brposNLTK
Author: Mahmudul Haque Shakir
Author-email: mahmudulhaqueshakir@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Natural Language :: Bengali
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: pandas>=1.3.0
Requires-Dist: numpy>=1.21.0
Requires-Dist: tensorflow>=2.10.0
Requires-Dist: openpyxl>=3.0.0
Provides-Extra: dev
Requires-Dist: keras>=2.12.0; extra == "dev"
Requires-Dist: scikit-learn>=1.2.0; extra == "dev"
Provides-Extra: testing
Requires-Dist: pytest>=7.0.0; extra == "testing"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# brnltk

**brnltk** is a Part-of-Speech (POS) tagging and dialect processing library for Bengali, supporting multiple regional dialects. It uses **LSTM-based deep learning models**, N-gram similarity, and rule-based stemming to provide accurate POS tagging, translation between Bengali dialects, tokenization, stemming, and sentence similarity checking.

---

## Features

* **POS Tagging**
  Train or load a POS tagging model for various Bengali dialects using LSTM with fallback mechanisms:

  * Dictionary lookup
  * Suffix-based normalization
  * N-gram similarity

* **Dialect Translation**
  Translate Bengali sentences from one regional dialect to another using n-gram similarity between words.

* **Tokenization**
  Word-level and sentence-level tokenization for Bengali text.

* **Stemming**
  Light stemming of Bengali words using rule-based suffix removal.

* **Sentence Similarity**
  Compute similarity scores between Bengali sentences using n-gram-based overlap.

---

## Dialect Name Mapping

The library follows this `AREA_MAPPING`:

| Short Name   | Full Column Name         |
| ------------ | ------------------------ |
| Barishal     | Barishal_bangla_speech   |
| Sylhet       | Sylhet_bangla_speech     |
| Chittagong   | Chittagong_bangla_speech |
| Mymensingh   | Mymensingh_bangla_speech |
| Noakhali     | Noakhali_bangla_speech   |
| General      | General                  |

Example usage:

```python

Dialect Translation
from brnltk import translate

original_sentence = "তুমি ভাত খাই"
translated_sentence = translate(
    sentence=original_sentence,
    from_area="General",
    to_area="Chittagong"
)

print("Original:", original_sentence)
print("Translated:", translated_sentence)

POS Tagging
from brnltk import run_pos_tagger

# Use the AREA_MAPPING keys directly
predict_func, _, _ = run_pos_tagger(column_name='Mymensingh', train=True)
sentence = "আমি শাকির"

result = predict_func(sentence, return_confidence=True)
for word, tag, conf in result:
    print(f"{word:<15} --> {tag:<10} (confidence: {conf:.2f})")


Tokenization
from brnltk import word_tokenize

text = "আমি ভাত খাই এবং কাজ করি।"
tokens = word_tokenize(text)
print("Tokens:", tokens)

Stemming
from brnltk import stem_sentence

sentence = "ছেলেটি খেলাধুলা করছে"
stemmed = stem_sentence(sentence)
print("Stemmed:", stemmed)

Sentence Similarity
from brnltk import overall_similarity

sentence1 = "আমি আজ স্কুলে যাই"
sentence2 = "আমি স্কুলে যাচ্ছি আজ"
score = overall_similarity(sentence1, sentence2)
print(f"Similarity Score: {score:.2f}")

Dataset Information

The library relies on a curated dataset containing Bengali words in multiple dialects, including:

General, English Translation, পদ (Google API), Updated Human, পদ (Human),
Barishal_bangla_speech, Sylhet_bangla_speech, Chittagong_bangla_speech,
Mymensingh_bangla_speech, Noakhali_bangla_speech


Use the AREA_MAPPING keys directly for all functions.

Contributing

Contributions are welcome! Please create a pull request or open an issue for any feature requests or bug reports.
"if any one want to update this dataset please emain mahmudulhaqueshakir@gmail.com"

License

MIT License © 2025 Shakir
